ITEM METADATA RECORD
Title: Unsupervised learning of auditory filter banks using non-negative matrix factorisation
Authors: Bertrand, Alexander ×
Demuynck, Kris
Stouten, Veronique
Van hamme, Hugo #
Issue Date: 2008
Publisher: IEEE
Host Document: Proceedings IEEE international conference on acoustics, speech, and signal processing - ICASSP’2008 pages:4713-4716
Conference: IEEE international conference on acoustics, speech, and signal processing - ICASSP’2008 location:Las Vegas, Nevada, USA date:March 30 - April 4, 2008
Abstract: Non-negative matrix factorisation (NMF) is an unsupervised learning technique that decomposes a non-negative data matrix into a product of two lower rank non-negative matrices. The non-negativity constraint results in a parts-based and often sparse representation of the data. We use NMF to factorise a matrix with spectral slices of continuous speech to automatically find a feature set for speech recognition. The resulting decomposition yields a filter bank design with remarkable similarities to perceptually motivated designs, supporting the hypothesis that human hearing and speech production are well matched to each other. We point out that the divergence cost criterion used by NMF is linearly dependent on energy, which may influence the design. We will however argue that this does not significantly affect the interpretation of our results. Furthermore, we compare our filter bank with several hearing models found in literature. Evaluating the filter bank for speech recognition shows that the same recognition performance is achieved as with classical MEL-features.
Description: Bertrand A., Demuynck K., Stouten V., Van hamme H., ''Unsupervised learning of auditory filter banks using non-negative matrix factorisation'', Proceedings IEEE international conference on acoustics, speech, and signal processing - ICASSP’2008, pp. 4713-4716, March 30 - April 4, 2008, Las Vegas, Nevada, USA.
Publication status: published
KU Leuven publication type: IC
Appears in Collections:ESAT - STADIUS, Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics
ESAT - PSI, Processing Speech and Images
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
bertrand.pdfFull text Published 141KbAdobe PDFView/Open

 


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science