ITEM METADATA RECORD
Title: Translation-based word clustering for language models
Authors: Pelemans, Joris
Van hamme, Hugo
Wambacq, Patrick
Issue Date: 2015
Host Document: CLIN 2015 book of abstracts pages:60
Conference: Meeting of computational linguistics in The Netherlands - CLIN 2015 edition:25 location:Antwerp, Belgium date:5-6 February 2015
Abstract: One of the major challenges in the field of language modelling (and others) is data sparsity. Even with the increasing amount of data, there is simply not enough data to reliably estimate probabilities for short word sequences, let alone full sentences. Hence,
research in this field has focused largely on finding relations between words or word
sequences, inferring probabilities for unseen events from seen events. In this work we
focus on a new approach to cluster words by examining their translations in multiple
languages. That is, if two words share the same translation in many languages, they are
likely to be (near) synonyms. By adding some context to the hypothesized synonyms
and by filtering out those that do not belong to the same part of speech, we are able to
find meaningful word clusters. The clusters are incorporated into an n-gram language
model by means of class expansion i.e. the contexts of similar words are shared to
achieve more reliable statistics for infrequent words. We compare the new model to a
baseline word n-gram language model with interpolated Kneser-Ney smoothing.
Description: Pelemans J., Van hamme H., Wambacq P., ''Translation-based word clustering for language models'', Book of abstracts 25th meeting of computational linguistics in The Netherlands - CLIN 2015, pp. 60, February 5-6, 2015, Antwerp, Belgium.
URI: 
Publication status: published
KU Leuven publication type: IMa
Appears in Collections:ESAT - PSI, Processing Speech and Images

Files in This Item:
File Description Status SizeFormat
3846__final.pdf Published 24KbAdobe PDFView/Open

These files are only available to some KU Leuven Association staff members

 


All items in Lirias are protected by copyright, with all rights reserved.