Download PDF

Language Resources and Evaluation Conference (LREC), Date: 2014/05/26 - 2014/05/31, Location: Reykjavik, Iceland

Publication date: 2014-01-01
Pages: 4018 - 4022
ISSN: 9782951740884
Publisher: European Language Resources Association; Paris

Proceedings of the 9th International Conference on Language Resources and Evaluation Conference (LREC)

Author:

Heylen, Kris
Bond, Stephen ; De Hertog, Dirk ; Vulic, Ivan ; Kockaert, Hendrik

Keywords:

Social Sciences, Linguistics, Language & Linguistics, Computer Assisted Translation, Legal Terminology, Big Data

Abstract:

Increasingly, large bilingual document collections are being made available online, especially in the legal domain. This type of Big Data is a valuable resource that specialized translators exploit to search for informative examples of how domain-specific expressions should be translated. However, general purpose search engines are not optimized to retrieve previous translations that are maximally relevant to a translator. In this paper, we report on the TermWise project, a cooperation of terminologists, corpus linguists and computer scientists, that aims to leverage big online translation data for terminological support to legal translators at the Belgian Federal Ministry of Justice. The project developed dedicated knowledge extraction algorithms and a server-based tool to provide translators with the most relevant previous translations of domain-specific expressions relative to the current translation assignment. The functionality is implemented as an extra database, a Term&Phrase Memory, that is meant to be integrated with existing Computer Assisted Translation tools. In the paper, we give an overview of the system, give a demo of the user interface, we present a user-based evaluation by translators and discuss how the tool is part of the general evolution towards exploiting Big Data in translation.