Title: Bridging the Gap between Pictographs and Natural Language
Authors: Vandeghinste, Vincent #
Issue Date: Dec-2012
Publisher: World Wide Web Consortium
Host Document: Proceedings of the RDWG Online Symposium on Easy-to-Read on the Web
Conference: Easy-to-Read on the Web date:3 december 2012
Abstract: The WAI-NOT environment is a platform which allows people with cognitive disabilities to communicate online using pictographs instead of text and supports two pictograph sets. Users can enter messages using their pictograph set and/or text. These messages are encoded as text and sent to the receiver where they are decoded into the target pictograph set wherever possible. There are two problems with this approach:
1. The decoding is purely string-based and no disambiguation takes place, which occasionally leads to wrong pictograph generation; and
2. The current string-based overlap between the two sets is too small to be of practical usage

We have collected a corpus of 200K words of e-mail messages sent with WAI-NOT, and we show how simple NLP techniques such as part-of-speech tagging and lemmatisation can improve the conversion of messages from one pictograph set to the other. A relative improvement of >45% was reached on unseen data.
Furthermore we discuss how in the next phase we will use word-sense-disambiguation and linking to Cornetto, a lexical-semantic database, to further improve the results.
Publication status: accepted
KU Leuven publication type: IC
Appears in Collections:Formal and Computational Linguistics (ComForT), Leuven
# (joint) last author

Files in This Item:
File Description Status SizeFormat
easytoread.tarhtml tar Accepted 30KbHTMLView/Open


All items in Lirias are protected by copyright, with all rights reserved.