Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Date: 2018/06/01 - 2018/06/06, Location: New Orleans, Louisiana
Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications
Author:
Abstract:
In this paper, we introduce NT2Lex, a novel lexical resource for Dutch as a foreign language (NT2) which includes frequency dis- tributions of 17,743 words and expressions attested in expert-written textbook texts and readers graded along the scale of the Common European Framework of Reference (CEFR). In essence, the lexicon informs us about what kind of vocabulary should be understood when reading Dutch as a non-native reader at a par- ticular proficiency level. The main novelty of the resource with respect to the previously developed CEFR-graded lex- icons concerns the introduction of corpus- based evidence for L2 word sense complexity through the linkage to Open Dutch WordNet (Postma et al., 2016). The resource thus con- tains, on top of the lemmatised and part-of- speech tagged lexical entries, a total of 11,999 unique word senses and 8,934 distinct synsets.