COMPUTATIONAL CYBERNETICS AND TECHNICAL INFORMATICS. INTERNATIONAL JOINT CONFERENCE. 2010. (ICCC-CONTI 2010) pages:481-485
IEEE International Joint Conferences on Computational Cybernetics and Technical Informatics location:Timisoara date:27/29 May 2010
In speech recognition there has been a trend to incorporate more and more knowledge about human hearing into the feature extraction step. One such approach is the application of localized spectro-temporal analysis, which is inspired by neurophysiological studies. Here we experiment with extracting features from the patches of the widely used criticial-band log-energy spectrum by applying
the two-dimensional cosine transform. Compared to earlier similar studies with the spectrogram representation, we find that our method is not worse, and faster. In experiments with noisy speech the proposed representation proves more noise-robust than the conventional mel-frequency cepstral features.