Journal of the American Statistical Association vol:101 issue:476 pages:1449-1464
This article is concerned with variable selection methods for the Cox proportional hazards regression model. Including excessive covariates causes extra variability and inflated confidence intervals for regression parameters; thus regimes for discarding the less informative ones are needed. Our framework has p covariates designated as "protected," while variables from a further set of q covariates are examined for possible inclusion or exclusion. We develop a focused information criterion (FIC) that for given interest parameter finds the best subset of covariates. Thus the FIC might find that the best model for predicting median survival time is different than the best model for estimating probabilities, and the best overall model for analyzing men's survival might riot be the same as the best overall model for analyzing women's survival. Methodology is also developed for model averaging, wherein the final estimate of a quantity is a weighted average of estimates computed for a range of submodels. Our methods are illustrated in simulations and for a survival study of Danish skin cancer patients.