Title: A dataset for the evaluation of lexical simplification
Authors: De Belder, Jan
Moens, Marie-Francine #
Issue Date: 2012
Publisher: Springer
Host Document: Lecture Notes in Computer Science vol:7182 pages:426-437
Conference: CICLing conference on intelligent text processing and computational linguistics edition:13 location:New Delhi, India date:11–17 March 2012
Abstract: Lexical Simpli cation is the task of replacing individual words of a text with words that are easier to understand, so that the text as a whole becomes easier to comprehend, e.g. by people with learning disabilities or by children who learn to read. Although this seems like a straightforward task, evaluating algorithms for this task is not so. The problem is how to build a dataset that provides an exhaustive list of easier to understand words in diff erent contexts, and to obtain an absolute ordering on this list of synonymous expressions. In this paper we reuse existing resources for a similar problem, that of Lexical Substitution, and transform this dataset into a dataset for Lexical Simpli cation. This new dataset contains 430 sentences, with in each sentence one word marked. For that word, a list of words that can replace it, sorted by their difficulty, is provided. The paper reports on how this dataset was created based on the annotations of di fferent persons, and their agreement. In addition we provide several metrics for computing the similarity between ranked lexical substitutions, which are used to assess the value of the di fferent annotations, but which can also be used to compare the lexical simpli cations suggested by an algorithm with the ground truth model.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
# (joint) last author

Files in This Item:
File Description Status SizeFormat
DeBelderMoensCICLING2012.pdf Published 307KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.