Title: The latent words language model
Authors: Deschacht, Koen
De Belder, Jan
Moens, Marie-Francine # ×
Issue Date: Oct-2012
Publisher: Academic Press
Series Title: Computer Speech and Language vol:26 issue:5 pages:384-409
Abstract: We present a new generative model of natural language, the latent words language model. This model uses a latent variable for every word in a text that represents synonyms or related words in the given context. We develop novel methods to train this model and to find the expected value of these latent variables for a given unseen text. The learned word similarities help to reduce the sparseness problems of traditional n-gram language models. We show that the model significantly outperforms interpolated Kneser-Ney smoothing and class-based language models on three different corpora. Furthermore the latent variables are useful features for information extraction. We show that both for semantic role labeling and word sense disambiguation, the performance of a supervised classifier increases when incorporating these variables as extra features. This improvement is especially large when using only a small annotated corpus for training.
ISSN: 0885-2308
Publication status: published
KU Leuven publication type: IT
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
Deschachtetal2012.pdf Published 680KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science