Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.
ASJC Scopus subject areas
- Statistics and Probability
- Health Information Management