Population stratification (PS) is referred to the systematic difference in allele frequencies between subpopulations in a population. It could cause a false-positive conclusion in a case-control association study, where the association is due to the structure of the underlying population, not a disease-associated locus. In this paper, we study the joint effects of PS and data sampling when the genetic effect is null. The level of the PS effect depends on the variation of the baseline genotype frequency across subpopulations and matching effectiveness of the sampling. In the case of simple random sampling (SRS), the matching effectiveness equals the inverse of the variation of the disease odds, and thus the PS bias is null under constant disease risk. However, if the latter condition holds but the sampling is not SRS, the bias may still exist. The magnitude of the bias increases as the deviation between the true sampling and SRS increases. We also derive bounds for the bias. If the bounds are approximately known or estimable, we show that this information can be used to compute a conservative p value for the usual association test. We give two real examples to demonstrate the application of the new method.
ASJC Scopus subject areas