Connectionist vector quantization in automatic speech recognition

Ma, Weiye; Van Compernolle, Dirk; Wambacq, Patrick

Author:

Ma, Weiye

Van Compernolle, Dirk ; Wambacq, Patrick

Keywords:

PSI_SPEECH

Abstract:

In this thesis, we successfully apply connectionist approaches, particularly the Multi-Layer Perceptron (MLP), to tasks of speech recognition. We present in detail the Back Propagation theory and its implementation issues, including a modified weight adaptation algorithm. We provide a weight updating strategy to speed up the convergence during network training. The training data is balanced phonetically such that the network treats all phonemes equally. We introduce a random database generator to obtain a robust MLP network. We introduce the fuzzy MLP into speech recognition and use the overlapped Hamming window as the fuzzy membership function for the MLP output. We design and implement the Multi-Layer Perceptron to be used as a labeler for the Hidden Markov Model (HMM) system, which combines the good short-time classification properties of MLPs with the good integration and overall recognition capabilities of discrete HMMs. The standard vector quantization has been replaced by an MLP labeler giving phone-like labels in an MLP/HMM hybrid system. Compared with using MLPs as probability generators for HMMs, our system is more flexible in system design because it can use the word models instead of phonetic models. Moreover, as it does not need to be trained to reach a global minimum, the network can have fewer hidden units and therefore can be trained faster. Also, we do not need to retrain our MLPs with segmentations generated by a Viterbi alignment. Compared to Euclidean labeling, our method has the advantages of needing fewer HMM parameters per state and of obtaining higher recognition accuracy. We use histograms to illustrate the MLP output value for each phonetic class. From those MLP output histograms, we observe that the winner take-all MLPs ignore the relativity of different phonetic classes. We extend our base-line winner-take-all method to several Top-N methods. A series of MLP/HMM hybrid models are discussed to fully use the MLP output information and to improve the speech recognition performance. Those investigated models are: MLP multi-dimensional labeling, MLP multi-labeling, MLP fuzzy-labeling, multi-MLP multi-labeling and multi-MLP fuzzy labeling.