Download PDF Download PDF

Southern African Linguistics and Applied Language Studies

Publication date: 2010-12-23
Volume: 28 Pages: 283 - 290
Publisher: National Inquiry Services Centre

Author:

Vanallemeersch, Tom
Kockaert, Hendrik

Keywords:

parallel corpora, legal translation, phraseology, terminology extraction, sentence alignment, word alignment, Social Sciences, Linguistics, Language & Linguistics, 2004 Linguistics, Languages & Linguistics, 4703 Language studies, 4704 Linguistics

Abstract:

We investigate the extent to which the detection of phraseological (in)consistency in the translation process can be automated. We describe the acquisition of a large corpus of Belgian legal documents consisting of French arrests translated into Dutch. We apply the sentence alignment tool GMA to the corpus, and extract phraseological unit candidates from the sentence pairs through the term candidate extraction tool TermCalc and the word alignment data produced by the GIZA++ tool. The candidates are compared to a reference set from a manual study of an MA student at Lessius/KULeuven. They appear to cover only 33% of the bilingual phraseological unit pairs and only four French units with more than one Dutch equivalent. This indicates the need for devising techniques specifically aimed at detecting multiple equivalence, hence potential phraseological inconsistency.