Title: The Construction of a 500-million-word Reference Corpus of Contemporary Written Dutch
Authors: Oostdijk, Nelleke
Reynaert, Martin
Hoste, Veronique
Schuurman, Ineke
Issue Date: 2013
Publisher: Springer
Series Title: Theory and Applications of Natural Language Processing
Host Document: Essential Speech and Language Technology for Dutch: resources, tools and applications pages:219-247
Article number: 13
Abstract: The construction of a large and richly annotated corpus of written Dutch was identified as one of the priorities of the STEVIN programme. Such a corpus, sampling texts from conventional and new media, is invaluable for scientific research and application development. The present chapter describes how in two consecutive STEVIN-funded projects, viz. D-Coi and SoNaR, the Dutch reference corpus was developed. The construction of the corpus has been guided by (inter)national standards and best practices. At the same time through the achievements and the experiences gained in the D-Coi and SoNaR projects, a contribution was made to their further advancement and dissemination.
ISBN: 978-3-642-30909-0
VABB publication type: VABB-4
Publication status: published
KU Leuven publication type: IHb
Appears in Collections:Formal and Computational Linguistics (ComForT), Leuven

Files in This Item:
File Description Status SizeFormat
sonar-springer-v3.utf8tilde.pdf Published 306KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.