Title: Knowledge transfer across multilingual corpora via latent topics
Authors: De Smet, Wim ×
Tang, Jie
Moens, Marie-Francine #
Issue Date: 2011
Publisher: Springer
Series Title: Lecture Notes in Computer Science vol:6634 pages:549-560
Conference: PAKDD2011: the 15th Pacific-Asia conference on knowledge discovery and data mining
Abstract: This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a uniļ¬ed probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual,
dictionary-less text categorization task. Experimental
results on multilingual Wikipedia data show that the
proposed topic model effectively discovers the topic
information from the bilingual corpora, and the learned
topics successfully transfer classification knowledge to other languages, for which no labeled training data are available.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
DeSmetPAKDD2011.pdfMain article Published 292KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science