Title: Robust and sparse canonical correlation analysis
Authors: Croux, Christophe
Wilms, Ines
Issue Date: 2014
Conference: Nordic Conference in Mathematical Statistics edition:25 location:Turku (Finland) date:2-6 June 2014
Abstract: Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each data set. However, in high-dimensional settings, where the number of variables exceeds the sample size, or when the variables are highly correlated, traditional CCA is no longer appropriate. This talk discusses a method for Robust Sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each data set. More precisely, some of the elements of the canonical vectors will be estimated as exactly zero. As such, the interpretability of the canonical variates is increased. We also robustify the method such that it can cope with outliers in the data.
We convert the CCA problem into an alternating regression framework. To obtain sparse canonical vectors, we add an L1 penalty on the coefficient estimates to the Least Squares estimator. The lasso, however, is not robust to outliers. The method can be easily robustified by using the sparse Least Trimmed Squares estimator.
We illustrate the good performance of the Robust Sparse CCA method in several simulation studies. In addition, the Robust Sparse CCA method is applied to a genomic data set.
Publication status: published
KU Leuven publication type: IMa
Appears in Collections:Research Center for Operations Research and Business Statistics (ORSTAT), Leuven

Files in This Item:

There are no files associated with this item.


All items in Lirias are protected by copyright, with all rights reserved.