Computational Linguistics in the Netherlands (CLIN) edition:24 location:Leiden date:17 January 2014
In Dutch V-final clauses the verbs tend to form a cluster which cannot be split up by nonverbal material, as in (1).
(1) … dat hij het haar gisteren had verteld.
* ... dat hij het haar had gisteren verteld.
"... that he had told her that yesterday."
However, the Algemene Nederlandse Spraakkunst (1997), as well as other studies on the phenomenon list several cases in which the verb cluster may be interrupted by so-called cluster creepers. The most common examples are constructions with separable verb particles, but examples with nouns, adjectives, and adverbs are attested as well, cf. (2).
(2) ... dat we ons daar nog mee kunnen bezig houden.
" ... that we can still keep ourselves busy with that."
Since most of the data in previous studies are collected by introspection and elicitation, it is interesting to compare those findings to corpus data. The corpus analysis is based on data from Dutch treebanks (CGN, LASSY, SoNaR), which allow to take into account regional and/or stylistic variation. This is an important aspect for the analysis, since cluster creeping is reported as a typical aspect of spoken and regional variants of Dutch.
The goal of this corpus-based investigation is on the one hand to provide insight in the frequency of the phenomenon, and on the other hand to classify the types of cluster creepers. Besides the linguistic analysis, methodological issues regarding the extraction of the relevant data from the treebanks will be addressed as well.