Title: Cross-language information retrieval with latent topic models trained on a comparable corpus
Authors: Vulic, Ivan ×
De Smet, Wim
Moens, Marie-Francine #
Issue Date: 2011
Publisher: Springer
Host Document: Lecture Notes in Computer Science pages:37-48
Conference: Proceedings of the 7th Asian information retrieval societies conference (AIRS 2011) location:Dubai, United Arab Emirates date:18-20 December 2011
Abstract: In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can be used as a translation resource in many different multilingual settings as comparable corpora are available for many language pairs. The probabilistic interlingual representation is incorporated in a statistical language model for information retrieval. Experiments performed on the English and Dutch test datasets of the CLEF 2001-2003 CLIR campaigns show the competitive performance of our approach compared to cross-language retrieval methods that rely on pre-existing translation dictionaries that are hand-built or constructed based on parallel corpora.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
VulicDeSmetMoensAIRS2011.pdfMain article Published 271KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science