Download PDF Download PDF

Annual conference of the International Speech Communication Association (ISCA) - Interspeech 2017, Date: 2017/08/20 - 2017/08/24, Location: Stockholm, Sweden

Publication date: 2017-01-01
Volume: 2017-August Pages: 1919 - 1923
ISSN: 978-1-5108-4876-4
Publisher: International Speech Communication Association

Proceedings Interspeech 2017

Author:

Zegers, Jeroen
Van hamme, Hugo

Keywords:

PSI_SPEECH, Science & Technology, Technology, Computer Science, Artificial Intelligence, Engineering, Electrical & Electronic, Computer Science, Engineering, Source Separation, Single Channel, Blind Multi Speaker Adaptation, SPEAKER ADAPTATION, NEURAL-NETWORKS, DEEP, PSI_4239

Abstract:

Copyright © 2017 ISCA. Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural network approach to this task is speaker adaptation. Intuitively, information on the speakers that we are trying to separate seems fundamentally important for the speaker separation task. However, retrieving this speaker information is challenging since the speaker identities are not known a priori and multiple speakers are simultaneously active. There is thus some sort of chicken and egg problem. To tackle this, source signals and i-vectors are estimated alternately. We show that blind multi-speaker adaptation improves the results of the network and that (in our case) the network is not capable of adequately retrieving this useful speaker information itself.