Title: Detecting highly confident word translations from comparable corpora without any prior knowledge
Authors: Vulic, Ivan ×
Moens, Marie-Francine #
Issue Date: Apr-2012
Publisher: ACL
Host Document: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) pages:449-459
Conference: Conference of the European chapter of the association for computational linguistics edition:13 location:Avignon, France date:23-27 April 2012
Abstract: In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precision oriented algorithm that relies on per-topic
word distributions obtained by the bilingual LDA (BiLDA) latent topic model.
The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English
language pairs that outperform the current state-of-the-art results by a significant margin. In addition, we show how to use the algorithm
for the construction of high-quality initial seed lexicons of translations.
ISBN: 978-1-937284-19-0
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
VulicMoensEACL2012final.pdfMain article Published 433KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.