Journal of Near Infrared Spectroscopy issue:submitted
The multivariate PLS technique is a very powerful technique however great care should be taken that the obtained results reflect the matters of interest. During this work different validation procedures were investigated on a dataset collected during various measuring days. Validation procedures frequently applied in practice i.e. leave one out cross validation (LOOCV) and validation based on random subdivision in calibration and validation set (RSS) were compared to a validation by splitting the dataset into a calibration and a validation set based on measurement day (MD). Results showed that LOOCV and RSS validation lead to very high RPD values whilst MD validation indicated that no information was available in the spectra. It was shown that PLS analysis was able to use small differences in spectra measured at different days for prediction. By means of a random permutation of the dependent variable it was shown that these differences were not related to differences in the dependent variable. It was stated that interpretation and fully understanding of PLS model is needed in order to rely on the results generated by the PLS analysis. Furthermore, the authors state that the only reliable PLS validation in case the dataset is gathered in time should be based on splitting the dataset into a calibration and a validation set based on measurement day (MD).