Journal of physical chemistry b vol:110 issue:45 pages:22786-22795
We analyze publicly available data on Affymetrix microarray spike-in experiments on the human HGU133 chipset in which sequences are added in solution at known concentrations. The spike-in set contains sequences of bacterial, human, and artificial origin. Our analysis is based on a recently introduced molecular-based model (Carlon, E.; Heim, T. Physica A 2006, 362, 433) that takes into account both probe-target hybridization and target-target partial hybridization in solution. The hybridization free energies are obtained from the nearest-neighbor model with experimentally determined parameters. The molecular-based model suggests a rescaling that should result in a "collapse" of the data at different concentrations into a single universal curve. We indeed find such a collapse, with the same parameters as obtained previously for the older HGU95 chip set. The quality of the collapse varies according to the probe set considered. Artificial sequences, chosen by Affymetrix to be as different as possible from any other human genome sequence, generally show a much better collapse and thus a better agreement with the model than all other sequences. This suggests that the observed deviations from the predicted collapse are related to the choice of probes or have a biological origin rather than being a problem with the proposed model.