EAMT, Date: 2011/05/30 - 2011/05/31, Location: Leuven

Publication date: 2011-05-01
Pages: 347 - 347
ISSN: 9789081486118
Publisher: Centre for Computational Linguistics; Leuven

Proceedings of the 15th Annual Conference of the European Association for Machine Translation

Author:

Vandeghinste, Vincent
Van den Bogaert, Joachim ; Martens, Scott ; Kotzé, Gideon

Keywords:

machine translation, syntax

Abstract:

The PaCo-MT project is building a stochastic example-based transfer system translating from Dutch into English and French, and vice versa. It is a data-driven tree-to-tree based approach towards MT, transducing the input parse tree into a set of target language parse trees without node ordering. This Synchronous Tree Substitution Grammar (limited to regular subtrees) is induced from a subtree-aligned parallel treebank, using a discriminative model for tree alignment. Monolingual parses were created by pre-existing parsers, such as the Alpino parser for Dutch, the Stanford parser for English, and the Berkeley parser for French. A tree-based target language modeler using a probabilistic context-free grammar based on large monolingual treebanks decodes the output forest and determines node ordering. By this approach we aim at combining the strengths of data-driven MT with the strengths of rule-based MT, avoiding the weaknesses of each of these approaches. Results show that although BLEU scores are not yet at par with Moses, long distance movements pose no problems for our approach, and we do not drop important words, yielding a more grammatical output than PBSMT systems.