TITLE: Non-sparse methods for out-of-sample prediction
in high-dimensional linear models
SPEAKER: Dr. Lee Dicker
ABSTRACT:
Motivated by questions about dense (non-sparse) signals in
high-dimensional data analysis, we study the unconditional out-of-sample
prediction error (predictive risk) associated with three classes of dense
estimators for high-dimensional linear models: Ridge regression
estimators, scalar multiples of the ordinary least squares estimator
(which we refer to as James-Stein estimators), and marginal regression
estimators. Our results require no assumptions about sparsity and imply
that in linear models where the number of predictors is roughly
proportional to the number of observations:
(i) If the population predictor covariance is known (or if a
norm-consistent estimator is available), then the ridge estimator
outperforms the James-Stein estimator; (ii) both the ridge and James-Stein
estimators outperform the ordinary least squares estimator, and the
improvements offered by these estimators are especially significant when
the signal-to-noise ratio is small; and (iii) the marginal estimator has
serious deficiencies for out-of-sample prediction. We derive new
closed-form expressions for the asymptotic predictive risk of the
estimators, which allow us to precisely quantify the previous claims.
Additionally, minimax ridge and James-Stein estimators are identified.
Finally, we argue that the ridge estimator is, in fact, asymptotically
optimal among dense estimators for out-of-sample prediction in
high-dimensional linear models.
Contact: Lee Dicker <ldicker@stat.rutgers.edu>