Title: Learning to bridge colloquial and formal language applied to linking and search of e-commerce data
Authors: Vulic, Ivan ×
Zoghbi, Susana
Moens, Marie-Francine #
Issue Date: 2014
Publisher: ACM
Host Document: Proceedings of the the 37th annual ACM SIGIR conference on research and development in information retrieval (SIGIR 2014) pages:1195-1198
Conference: The 37th annual ACM SIGIR conference on research and development in information retrieval (SIGIR 2014) location:Gold Coast, Australia date:6-11 July 2014
Abstract: We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared, that is, idiom-specific. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We intrinsically evaluate our model by the perplexity measure. Following that, as an extrinsic evaluation, we present the utility of the new MiLDA topic model in a recently proposed IR task of linking Pinterest pins (given in colloquial English on the users' side) to online webshops (given in formal English on the retailers' side). We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
VulicZoghbiMoensSIGIR2014Final.pdf Published 385KbAdobe PDFView/Open Request a copy

These files are only available to some KU Leuven Association staff members


All items in Lirias are protected by copyright, with all rights reserved.