Computer speech and language vol:19 issue:2 pages:171-204
We present a novel language model, suitable for large-vocabulary continuous speech recognition, based on parsing with a probabilistic left corner grammar (PLCG). The PLCG probabilities are conditioned on local and non-local features of the partial parse tree, and some of these features are lexical. They are not derived from another stochastic grammar, but directly induced from a treebank, a corpus of text sentences, annotated with parse trees. A context-enriched constituent represents all partial parse trees that are equivalent with respect to the probability of the next parse move. For computational efficiency the parsing problem is represented as a traversal through a compact stochastic network of constituents connected by PLCG moves. The efficiency of the algorithm is due to the fact that the network consists of recursively nested, shared subnetworks. The PLCG-based language model results from accumulating the probabilities of all (partial) paths through this network. Next word probabilities can be computed synchronously with the probabilistic left corner parsing algorithm in one single pass from left to right. They are guaranteed to be normalized, even when pruning less likely paths. Finally, it is shown experimentally that the PLCG-based language model is a competitive alternative to other syntax-based language models, both in efficiency and accuracy. (C) 2004 Elsevier Ltd. All rights reserved.
Van Uytsel D.H., Van Compernolle D., ''Language modeling with probabilistic left corner parsing'', Computer speech and language, vol. 19, no. 2, pp. 171-204, April 2005.