Download PDF

Natural Language Engineering

Publication date: 2022-09-01
Volume: 28 Pages: 649 - 667
Publisher: Cambridge University Press (CUP)

Author:

De Troij, Robbert
Grondelaers, Stefan ; Speelman, Dirk ; van den Bosch, Antal

Keywords:

Science & Technology, Social Sciences, Technology, Computer Science, Artificial Intelligence, Linguistics, Language & Linguistics, Computer Science, syntactic variation, Dutch, existential constructions, memory-based learning, national variation, LANGUAGE, DEFINITENESS, 0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences, 2004 Linguistics, Artificial Intelligence & Image Processing, 4602 Artificial intelligence, 4605 Data management and data science, 4704 Linguistics

Abstract:

This article builds on computational tools to investigate the syntactic relationship between the highly related European national varieties of Dutch, viz. Belgian and Netherlandic Dutch. It reports on a series of memory-based learning (MBL) analyses of the post-verbal distribution of er ‘there’ in adjunct-initial existential constructions like Op het dak staat (er) een schoorsteen ‘On the roof (there) is a chimney’, which has been claimed to be among the most notoriously difficult variables in Dutch. On the basis of balanced datasets extracted from Flemish and Dutch newspaper corpora, it is shown that er’s distribution in both national varieties can be learned to a considerable extent from bare lexical input which is not assigned to higher-level categories. However, whereas this yields good results for Netherlandic Dutch, Belgian Dutch scores are consistently lower, suggesting that Belgian Dutch cannot do with lexical features alone to attain accuracy scores comparable to Netherlandic Dutch. This ties in with earlier findings that the more advanced standardization of Netherlandic Dutch materializes in a higher lexical collocability, whereas Flemish speakers need additional higher-level linguistic information to insert er.