Distribution of the Number of False Discoveries in Genome-Wide Linkage Scan

Project: A - Government Institutionb - Ministry of Science and Technology

Project Details

Description

The major purpose of modern genetic epidemiology is to identify the genetic basis of complex diseases or traits of interest and linkage and association analyses are two fundamental tools. During the past years, based on the hypothesis of “common disease, common variant” and the development of genotyping technology, the genome-wide association studies (GWAS) grew rapidly and once weakened the importance of linkage analysis in human genetic research. Although, up to now, GWAS identified more than 9000 single nucleotide polymorphisms (SNP) associated with human diseases or traits, most of them can explain only small proportion of risk of diseases or variation of quantitative traits. This phenomenon is called “missing heritability”. Recently, to deal with the “missing heritability” problem, studies of rare variants are conducted, which are based on the hypothesis of “common disease, rare variant” and the development of “next-generation” sequencing (NGS) technology. Currently, the combination of genome-wide linkage analysis and NGS technology is an important strategy for identifying causal variants of complex diseases, since linkage analysis is powerful to detect variants with large effect size, which are often rare in the population. The occurrence of the false-positive results, however, is an adverse factor for the use of linkage analysis, which was resulted mainly from the multiple testing problems. Since the purpose of genome-wide linkage scan is to identify genome regions that potentially harbor the causal variants of the complex diseases or traits, providing an appropriate estimate of the false discovery rate (FDR) for these findings is an efficient approach. On the other hand, an important issue of estimating the FDR of significant findings obtained from the genome-wide linkage scan is the correlation between the test statistics. Efron (2007) indicated the effect of correlation on the inference in large-scale hypotheses testing, particularly in terms of the estimation of FDR. In this project, we will investigate the number of false discoveries in the genome-wide linkage scan by developing a non-parametric multipoint linkage analysis method based on a Monte Carlo approach. We will estimate the distributions of the number of false discoveries under consideration of the correlation between the multiple linkage test statistics. Then, in terms of the calculation of FDR, we will compare the performance of this estimated distribution with those estimated without consideration of correlation through simulation studies. In addition, we will apply our methods to some real data sets. Our methods will help researchers to incorporate linkage information efficiently into the identification of the causal variants of complex diseases or traits.
StatusFinished
Effective start/end date8/1/137/31/15

Keywords

  • false discovery rate
  • genome-wide linkage analysis
  • Monte Carlo approach
  • multiple testing problems
  • rare variant