With advances in technology, human-machine interfaces have become commonplace. Their design requires a great deal of engineering efforts to make them functional and accessible. One of these engineering efforts is the embedding of voice control. This improves the accessibility for people with a physical disability. In common speech-enabled command-and-control applications, the spoken commands are restricted to a predefined list of phrases and grammars. These conventions work well as long as the system does not have to stray too far from the conditions considered by the designer or from the characteristics of the training material. Speech technology would benefit from training during usage; learning the specific vocalizations and the emerging expressions of the user. Designing a vocal user interface (VUI) model from this developmental perspective would widen accessibilitynbsp;caternbsp;users with non-standard or dysarthric speech. The research in this dissertation is aimed at the development of a self-taught VUI that learns speech commands from the user while it is operational. Tonbsp;end, we adopt and introduce different procedures in order to build a VUI-model that learnsnbsp;a few learning examples. A learning example consists of two sources of information: the spoken command and the demonstration of the commanded action.nbsp;sources of information are converted to fixed-length utterance-based vectors. The followed approach links the acoustic patterns that are embedded in the spoken utterances to the concepts that jointly define the meaning of the utterance. The method represents the data by its recurrent acoustic and semantic patterns and the incidence of these patterns in the data. Since these patterns are embedded in the data, the representation of the data has a significant influence over the performance of thenbsp;model. A thorough analysis ofnbsp;representations resorting to speaker-dependent and speaker-independent data resources, is made. Attention is also given to the representation of thenbsp;action. The representation of the commanded action consists of an incidence vector representing the semantic content of the demanded action. Users are non-experts innbsp;a VUI, therefore, errors such as uttering an incomplete command or pushing a wrong button,nbsp;emerge. We demonstrate robustness against these kinds of errors. Another issue pertaining to semantics is the correlation between relevant concepts in a spoken utterance. This dependency is an additional source of information. We exploit this information and compare different semantic structures pertaining to these semantic dependencies.nbsp;nbsp; With the focus on the learning process rather than on the resulting model, we develop procedures for incremental and adaptive learning. By exploiting a semi-Bayesian procedure called maximum a posteriori (MAP) estimation, the VUI model can be made to learn incrementally, one utterance at a time. Incremental learning procedures are developed at the level of the basic acoustic atoms and at the level of the word models. They are compared with their batch learning variants and yield comparable accuracy. The implementation of a forgetting factor makesnbsp;models adaptive to changes in the speech of the user. The learning curves are an assessment of the quality of learning in function of the amount of training data. We analyse the learning curves for all these developments by numerous experiments in realistic learning scenarios implemented on computer. By this, we acquire a sense of the systemnbsp;performance in a real-world training environment. Thenbsp;of the VUI in its operational context and the training of the VUI by the user are the twonbsp;important key aspects that inspired the conception, the developments and the research questions in this study.
Ons B., ''The self-taught speech interface'', Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen, KU Leuven, May 2015, Leuven, Belgium.