Chemometrics and Intelligent Laboratory Systems vol:123 pages:36-43
Ordinary linear regression falls short when many predictors are available, especially when some of these are highly correlated with (a linear combination of) other predictors. One possible solution for this problem is Principal Covariates Regression (PCovR) which combines the main ideas behind Principal Component Analysis (PCA) and regression. Like PCA, PCovR reduces the predictors to a few components and, like regression, it predicts the criterion, but using the components as predictors. The reduction of the predictors and the prediction of the criterion is conducted simultaneously, by minimizing the weighted sum of the reduction error and the prediction error. How the value of the weighting parameter α can be optimally tuned, is not so obvious however. In this paper we integrate scattered findings on this topic and derive some hypotheses on which α value is optimal in which respect and on the importance of tuning the value (how robust are the obtained results for the value that is chosen for α?). We put these hypotheses to the test by performing an extensive simulation study. As predicted, the α value that optimizes recovery of the underlying parameters and true criterion scores depends amongst others on the number of predictors in a specific dataset and on the ratio of the amount of error on the predictors and the amount of error on the criterion. Moreover, we show that α is mostly of influence when the components strongly differ in strength and relevance, when the number of observations almost equals the number of predictor variables, or when the criterion contains a moderate to high amount of error.