Journal of Near Infrared Spectroscopy vol:18 issue:4 pages:231-237
Inverse modelling techniques, such as principal component regression, partial least squares regression and support vector machines, are very powerful multivariate calibration strategies which are widely used in near infrared spectroscopy. However, these techniques are so efficient in finding correlations between the spectral variables and the parameter to be predicted that great care should be taken to avoid over-optimistic results by use of a proper validation strategy. In this study, different validation strategies were investigated on a dataset that was acquired during various measurement days. The goal was to predict albumen freshness based on spectral measurements. Validation procedures frequently applied in practice, i.e. 10-fold cross-validation (10-fold CV) and validation based on random subdivision in calibration and validation set (RS) were compared to a cross-validation across measuring day (MD). Whereas 10-fold CV and RS validation suggested that prediction of albumen freshness is possible, MD validation on the same dataset indicated that albumen freshness cannot be predicted from the spectral measurements. It is shown that inverse modelling is very sensitive to unspecific correlations between the spectral measurements and the dependent variable, which might be artifacts of the measurement protocol and will not be persistent in the future. Therefore, selection of the right validation strategy for a given application and critical evaluation of the obtained results are crucial steps in inverse modelling to obtain useful calibration models. More specifically, in the context of process analytical technology where spectra are acquired over time, great care should be taken to break the unspecific correlation between the dependent variable and the variations in the spectral measurements over time.