TITLE:  Group association test using a hidden Markov model

ABSTRACT:

In the genomic era, group association tests are of great interest. Due to the overwhelming number of individual genomic features, the power of testing for association of a single genomic feature at a time is often very small, as are the effect sizes for most features. Many methods have been proposed to test association of a trait with a group of features within a functional unit as a whole, e.g. all single nucleotide polymorphisms (SNP) in a gene, yet few of these methods account for the fact that generally a substantial  proportion of the features are not associated with the trait.  In this paper, we propose to model the association for each feature in the group as a mixture of features with no association and features with non-zero associations to explicitly account for the possibility that a fraction of features may not be associated with the trait while other features in the group are. The feature-level associations are first estimated by generalized linear models; the sequence of these estimated associations are then modeled by a hidden Markov chain. To test for global association, we develop a modified likelihood ratio test based on a log-likelihood function that ignores higher order dependency plus a penalty term. We derive the asymptotic distribution of the likelihood ratio test under the null hypothesis. Furthermore, we obtain the posterior probability of association for each feature, which provides evidence of feature-level association and is useful for potential follow up studies. In simulations and data application, we show that our proposed method performs well as compared to existing group association tests especially when there are only few features associated with the outcome.

Bio

Yichen Cheng received a bachelor's degree in Mathematics and Applied Mathematics from Fudan University, China. After that, Cheng earned a PhD in Statistics from Texas A&M University. Currently, Cheng is working as a Post-doc at Fred Hutchinson Cancer Research Center with research interests in computational statistics and biostatistics.