Download PDF

Noise Robust Exemplar Matching for Speech Recognition and Enhancement

Publication date: 2015-05-26

Author:

Yilmaz, Emre

Keywords:

PSI_SPEECH

Abstract:

This thesis introduces a novel noise robust automatic speech recognition scheme by introducing noise modeling capabilities to exemplar matching-based acoustic modeling. This is achieved by combining exemplar-based sparse representations and exemplar matching. More specifically, exemplars associated with speech units that are used in exemplar matching-based acoustic modeling are used in a conventional exemplar-based sparse representations formulation. As a result of the multiple-length exemplars, the proposed recognizer uses multiple speech dictionaries, each containing exemplars associated with the same speech unit and of the same duration. The inherent noise modeling problem of exemplar matching-based techniques originates due to the intractable task of evaluating all possible alignments of speech and noise exemplars to be able to perform source separation. In other words, it is not possible to discover the speech and noise components of a noisy mixture by comparing with individual speech and noise exemplars due to the enormous number of possible alignments. In our approach, we remedy this problem by approximating noisy speech features as a linear combination of speech and noise exemplars of all available exemplar lengths. The decoding is performed based on the reconstruction errors of each dictionary similar to the traditional exemplar matching. The initial investigation of the proposed exemplar matching system focuses on clean speech recognition by only using speech exemplars and comparing the test segments and exemplars of the same length. These experiments will establish the basics of the proposed exemplar matching using a sparse representation model. Then, we further investigate a model that can accommodate time warping in the new proposed setting and evaluate the clean speech performance of the system with time warping. Finally, various exemplar selection criteria have been proposed for the undercomplete speech dictionaries and the decrease in the recognition accuracy with increasing pruning rate will be explored. The noise robust exemplar matching concept will be introduced after clean speech experiments with a focus on thenbsp;noise exemplar extraction technique. This adaptive noise modeling approach considerably increases the recognition performance of the proposed approach especially under severe noise conditions. In addition to this technique, we look into the recognition performance by adopting a more flexible divergence family, namely alpha-beta (AB) divergence, in place of the conventional generalized Kullback Leibler divergence. Having two parameters, the AB divergence provides improved robustness against the background noise. In the last part of work, the speech enhancement performance of the proposed framework will be investigated by comparing the noise suppression performance with other baseline enhancement systems. In addition to these experiments, the novel speech enhancement system is employed in the front-end of a conventional GMM-HMM recognition system to evaluate the impact of the front-end denoising on the recognition performance.