Probabilistic language modeling with left corner parsing

Van Uytsel, Nils Dong Hoon

Author:

Van Uytsel, Nils Dong Hoon

Keywords:

PSI_SPEECH

Abstract:

This thesis contributes to the research domain of statistical language modeling. In this domain, the human generation of natural language is represented as a probabilistic data stream of sentences. Language modeling research attempts to find optimal estimators of the probability that some sentence is produced, albeit within the context of a given natural language application. Statistical language models are an essential part of automatic speech recognition systems, amongst other natural language applications. The research described in this thesis is limited to the class of grammar-based language models. In contrast with other conventional language modeling techniques, these models predict the probability of the input sentence, which is observable, based on an ambiguous intermediate prediction of the grammatical structure of that sentence, which is not observable. This approach was believed unsuited for statistical large-vocabulary speech recognition. Together with a few other recent publications, this doctorate study shows that taking grammatical structure into account enables significant improvements in language model performance. The most important concrete result is the development of a language model that follows a lexicalized left corner parsing strategy. My adaptation of this well-known parsing strategy represents several concurrent sentence analyses and their probabilities efficiently in both time and space. The proposed language model is initialized with a set of hand- or machine-parsed sentences, but is then optimized on plain text. It allows to predict the conditional probability of a next word, given the preceding ones. This is important for integration with other conventional language modeling techniques, and for the early combination with the other knowledge sources by the search engine of a speech recognizer. Finally the proposed language model is shown to improve the recognition accuracy on the Wall Street Journal task, with respect to a combination of a word-based trigram model and a class-based 4-gram model. The observed improvement with respect to another recently published grammar-based model is smaller, but at equal performance levels, the left corner parsing model only requires a fraction of the learning time of the other model.