Abstract: We investigate the robustness of the model-X knockoffs framework with respect to the misspecified or estimated feature distribution. We achieve such a goal by theoretically studying the feature selection performance of a practically implemented knockoffs algorithm, which we name as the approximate knockoffs (ARK) procedure, under the measures of the false discovery rate (FDR) and k-familywise error rate (k-FWER). The approximate knockoffs procedure differs from the model-X knockoffs procedure only in that the former uses the misspecified or estimated feature distribution. A key technique in our theoretical analyses is to couple the approximate knockoffs procedure with the model-X knockoffs procedure so that random variables in these two procedures can be close in realizations. We prove that if such coupled model-X knockoffs procedure exists, the approximate knockoffs procedure can achieve the asymptotic FDR or k-FWER control at the target level. We showcase three specific constructions of such coupled model-X knockoff variables, verifying their existence and justifying the robustness of the model-X knockoffs framework. Additionally, we formally connect our concept of knockoff variable coupling to a type of Wasserstein distance. 

 

Bio: Lan Gao is an Assistant Professor in Department of Business Analytics and Statistics at University of Tennessee Knoxville (UTK). She obtained her bachelor’s degree in Statistics at Wuhan University in 2015 and PhD degree in statistics from the Chinese University of Hong Kong in 2019. She worked as a postdoctoral scholar at University of Southern California before joining UTK. Her research interests lie in the areas of high-dimensional statistics and inference, nonparametric statistics, asymptotic theory, and machine learning.