Download PDF

Unraveling and unlocking the assets of principal covariates regression

Publication date: 2017-09-26

Author:

Vervloet, Marlies

Keywords:

regression, dimension reduction, multicollinearity, model selection

Abstract:

In the behavioral sciences, researchers often link a criterion to multiple predictors, using multiple linear regression. In almost all cases the main aim of the analysis is to obtain a better understanding of the unique and shared relations between the predictors and the criterion. A complication that is often encountered is that the set of potential predictors is rather large. This raises an interpretational burden, because the regression weights only reflect the unique effects of the predictors on the criterion and shed no light on shared effects. Moreover, the more predictors, the more chances increase that at least some of the predictors will be highly correlated with a linear combination of the other predictors. This so-called multicollinearity phenomenon is problematic because it leads to unstable regression weights. A promising but overlooked method that was presented in chemometrics to deal with these complications is principal covariates regression (PCovR). PCovR tackles the complications through a dimension reduction approach. It captures the main information in the predictors in a limited number of summarizing variables, called components. Simultaneously, PCovR uses these components to predict the criterion. The most important assets of PCovR are that this simultaneous optimization of reduction and prediction always has a closed form solution, and that users can choose to which degree reduction and prediction are emphasized through a weighting parameter. Nevertheless, PCovR is not often used in the behavioral sciences, because of some remaining obstacles, which we attempt to clear in this doctoral dissertation. In Chapter 1, we zoom in on the weighting parameter. We report the results of a literature study and an extensive simulation study with regards to how to tune this parameter. Model selection in PCovR, however, does not only consist of selecting the weighting parameter value, but also of selecting the number of components. We propose four model selection strategies in Chapter 2 and put the performance of these strategies to the test in a simulation study. Moreover, we compare the obtained PCovR solution to those that result from two more popular dimension reduction based techniques: partial least squares (PLS) and principal components regression (PCR), showing that PCovR outperforms the other two in recovering data generating components that explain variance in the criterion. Chapter 3 compares the performance of PCovR and exploratory structural equation modeling (ESEM). ESEM is a factor analysis based method that can be used to estimate PCovR-like models. Finally, in Chapter 4, we present the R package PCovR. This package allows users to perform all PCovR analysis steps: preprocessing the data, parameter estimation, model selection, and rotating the retained solution for easier interpretation.