International Journal of Corpus Linguistics vol:19 issue:4 pages:478-504
As repositories of spontaneously realized language, corpora generally have an uncontrolled and unbalanced structure where all variables operate simultaneously. Consequently, a variable’s real effect can be concealed when studied in isolation because of the exclusion of the impact of other potentially confounding variables. Analyzing a variational case study, the alternation between inflected and uninflected attributive adjectives in Dutch, it will be demonstrated how confounding variables alter the impact of explanatory variables on the response variable, resulting in spurious effects in the bivariate analyses. Multiple Correspondence Analysis will be used as a heuristic tool to unveil the association patterns between explanatory variables in the data matrix which induce the spurious effects. Based on these findings, we will argue for a thorough analysis of the database patterns to gain insight in the underlying associations between explanatory variables before modeling their real impact on the response variable in a multivariate model.