Improving fuzzy matching through syntactic knowledge

Vanallemeersch, Tom; Vandeghinste, Vincent

Translating and the Computer

Improving fuzzy matching through syntactic knowledge

Author:

Vanallemeersch, Tom

Vandeghinste, Vincent

Keywords:

Translation Memory, Fuzzy Matching

Abstract:

Fuzzy matching in translation memories (TM) is mostly string-based in current CAT tools. These tools look for TM sentences highly similar to an input sentence, using edit distance to detect the differences between sentences. Current CAT tools use limited or no linguistic knowledge in this procedure. In the recently started SCATE project, which aims at improving translators’ efficiency, we apply syntactic fuzzy matching in order to detect abstract similarities and to increase the number of fuzzy matches. We parse TM sentences in order to create hierarchical structures identifying constituents and/or dependencies. We calculate TER (Translation Error Rate) between an existing human translation of an input sentence and the translation of its fuzzy match in TM. This allows us to assess the usefulness of syntactic matching with respect to string-based matching. First results hint at the potential of syntactic matching to lower TER rates for sentences with a low match score in a string-based setting.

Translating and the Computer Improving fuzzy matching through syntactic knowledge

Author:

Keywords:

Abstract:

Translating and the Computer

Improving fuzzy matching through syntactic knowledge