Detecting disease association signals with multiple genetic variants and covariates

K. F. Cheng, J. Y. Lee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.

Original languageEnglish
Pages (from-to)1281-1294
Number of pages14
JournalStatistical Methods in Medical Research
Volume26
Issue number3
DOIs
Publication statusPublished - Jun 1 2017

Fingerprint

Covariates
Genotype
Stratification
Gene
Breast Cancer
Breast Neoplasms
Genes
Exome
Power of Test
Testing
Population
Effect Size
Type I error
Imputation
Principal Components
Demonstrate
Sequencing
Technology
Regression Model
Simulation

Keywords

  • Association test
  • asymptotic
  • bootstrap
  • covariate
  • missing genotype
  • power
  • random effects

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this

Detecting disease association signals with multiple genetic variants and covariates. / Cheng, K. F.; Lee, J. Y.

In: Statistical Methods in Medical Research, Vol. 26, No. 3, 01.06.2017, p. 1281-1294.

Research output: Contribution to journalArticle

@article{c4523f6eb42b4ccfaa133e058eff2135,
title = "Detecting disease association signals with multiple genetic variants and covariates",
abstract = "Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.",
keywords = "Association test, asymptotic, bootstrap, covariate, missing genotype, power, random effects",
author = "Cheng, {K. F.} and Lee, {J. Y.}",
year = "2017",
month = "6",
day = "1",
doi = "10.1177/0962280215574541",
language = "English",
volume = "26",
pages = "1281--1294",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "3",

}

TY - JOUR

T1 - Detecting disease association signals with multiple genetic variants and covariates

AU - Cheng, K. F.

AU - Lee, J. Y.

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.

AB - Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.

KW - Association test

KW - asymptotic

KW - bootstrap

KW - covariate

KW - missing genotype

KW - power

KW - random effects

UR - http://www.scopus.com/inward/record.url?scp=85020691997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020691997&partnerID=8YFLogxK

U2 - 10.1177/0962280215574541

DO - 10.1177/0962280215574541

M3 - Article

VL - 26

SP - 1281

EP - 1294

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 3

ER -