Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the “winner’s curse” will occur.
An Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association. The corresponding credible interval is estimated in the proposed approach. Simulation experiments show that our method is better than other plug-in based estimators in terms of overcoming the winner’s curse and providing higher estimation accuracy. The coverage probability of given credible interval is well-calibrated in the simulation experiments. Weighted average method is used to estimate the average power of all underlying true associations. This is used to determine the sample size of replication study. Sample sizes are estimated on 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) using our method. They are higher than sample sizes estimated by plugging observed effect sizes in power calculation.
Our new method can objectively determine replication study’s sample size by using information extracted from primary study. Also the winner’s curse is alleviated. Thus, it is a better choice when designing replication studies of GWAS. The R-package is available at: http://bioinformatics.ust.hk/RPower.html.
Genome-wide association studies (GWAS) are widely used to identify susceptibility variants of common diseases. Commonly, single nucleotide polymorphisms (SNPs) are genotyped across the whole genome in different individuals, and statistical methods are used to detect the associations between SNPs and disease status. According to the summary of GWAS catalog ([1], accessed [2015.05.28]), about 2000 GWAS reports related to 756 diseases/traits have been published so far, from which 14,609 associations show genome-wide significance (p-value ≤5×10 −8 ). More and more associations will be discovered from GWAS.
The basic statistical method used in GWAS analysis is hypothesis testing [2]. The possibilities of false positives cannot be completely removed in the analysis. Hence, all findings from GWAS need to be verified. Replication study is a commonly used approach to verifying positive findings [3, 4]. If an association between one specific SNP and a certain disease has been identified in the primary study and confirmed in the replication study, we usually treat this association as true positive with a high confidence. If an association identified in the primary study cannot be confirmed in the replication study, we often suspect that it is a false positive.
The power of replication study is crucial in this validation process. If the replication study is underpowered, then the positive findings will have a low chance to be replicated. It’s essential to design a replication study with enough statistical power.
How to estimate the power of a replication study in the design phase?
Traditionally, a replication study is regarded as another independent primary study. Thus, the same power calculation in the original primary study is used. For the associations identified in the primary study, a minimum effect size needs to be specified. Then, the underlying alternative distribution of test statistics is assumed to have specified effect size. The major limitation of this traditional power calculation method is that the specification of the effect size is subjective and may cause bias. Besides, no information from primary study has really been used.
One may think to plug the observed effect sizes from the primary study in the power calculation of the replication study. This power estimation approach doesn’t need to specify any parameters. Since only significant associations are considered in the replication study, the observed effect sizes for those associations will tend to be overestimated [5]. This phenomenon is known as the “winner’s curse” [6], which makes the estimated powers tend to have higher values.
A lot of methods have been proposed to overcome the winner’s curse in effect size estimation. An incomplete list includes conditional maximum likelihood estimation (CMLE, [7–9]), bootstrap [10], full Bayesian method [11] and Empirical Bayes method (EB, [12]). Since power function is usually not a linear function of effect size, the estimators obtained by simply plugging those bias-corrected effect sizes in power calculation may not achieve the best performance.
Moreover, there are two other challenges in designing replication study:
This paper aims at addressing the above challenges. Our contributions are listed in the following:
The rest of this paper is organized as follows. In section ‘Methods’, we will introduce the Bayesian framework to estimate the power of replication studies. We will prove that Bayesian predictive power is immune to the winner’s curse. Then we will present how to estimate the power with two-component mixture prior under the Bayesian framework. We will also give the details about estimation of hyperparameters, interval estimation and the estimation of average power. In section ‘Results and discussion’, we will first use simulation results to demonstrate that our EB based method is better than other plug-in based estimators in terms of overcoming the winner’s curse and providing higher estimation accuracy. We will also demonstrate that the coverage probability of given credible interval is well-calibrated. Then we will show the sample sizes determined to replicate findings of 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) [13], which are much higher than the sample sizes estimated by plugging observed effect sizes in the power calculation formula. The increased sample sizes are reasonable due to the winner’s curse. In the same section, we will discuss limitations of current modeling and estimation approach. Section ‘Conclusions’ concludes the paper.
We use parenthesized superscript “ (j)” to denote primary study (j=1) and replication study (j=2). For example, we denote the sample size in the primary study as n (1) . The sample size in the control group and case group are \(n_^\) and \(n_^\) , respectively. The total number of SNPs genotyped in the primary study is m. Among those genotyped SNPs, the proportion of the SNPs having no association with the disease (null SNPs) is π 0(0≤π 0≤1).
In both the primary study and the replication study, a contingency table can be created as in Table 1 for each genotyped SNP. With the contingency table, the logarithm of the observed odds ratio reads:
$$ \widehat<\mu>^=\log n_^-\log n_^-\log n_^+\log n_^. $$In order to check whether our EB based power estimator \(\widehat _>^\) can overcome the winner’s curse, the histogram of the differences between estimated values and true values is shown. As a comparison, we will show the corresponding histogram for power estimator by plugging in observed effect size first. A lot of methods have been proposed to overcome the winner’s curse in terms of effect size estimation. CMLE and EB can be used directly in z-values of log-odds ratio test. The individual-level genotype data are also simulated so that bootstrap based bias reduction method BR2 [10] can also be used as a comparison (We modified the implementation code of BR2 so that log-odds ratio test can be used in the software). A direct thinking is to plug these adjusted estimators in the power calculation formula. The corresponding histograms for these three adjusted plug-in based estimators are shown as comparisons. In Fig. 3 a, we use plug-in rule to estimate the replication study’s power, where the observed effect size is plugged in. The estimated power is \(\beta ^\left (\widehat <\mu >^\right)\) . We plot the histogram of the difference between \(\beta ^\left (\widehat <\mu >^\right)\) and the true power values β (2) (μ) in the figure. The overestimated effect size makes the estimated replication study’s power overestimated as well. Figure 3 b plots the histogram of the difference between \(\beta ^\left (\widehat <\mu >^_>\right)\) and β (2) (μ). The winner’s curse has disappeared, but there is a large downward bias in the estimated results. Equation (10) also introduces an Empirical Bayes estimator of the effect size, which reads
$$ \widehat<\mu>^_=\frac<1+\left(\sigma^/\widehat_\right)^> \widehat<\mu>^. $$Figure 3 c, d plot the histogram of \(\beta ^\left (\widehat <\mu >^_\right)-\beta ^(\mu)\) and \(\beta ^\left (\widehat <\mu >^_>\right)-\beta ^(\mu)\) , respectively. Large upward biases still exist in the histograms. In contrast, Fig. 3 e plots the histogram of \(\widehat ^_>-\beta ^(\mu)\) , where \(\widehat ^_>\) is our proposed EB based power estimator. The bias almost disappeared, indicating \(\widehat ^_>\) is better than other estimators of the replication study’s power in terms of overcoming the winner’s curse. The experiment has run 5 times, and the same conclusion holds in each run. The empirical biases of these five estimators can be seen in the Table 2.
To check the performance of our method when the effect sizes of the associated SNPs are do not follow normal distribution, we also simulated data with the following distributed effect sizes:
$$ \mu \sim 0.9\delta_<0>+0.1t_ $$0> $$ \mu \sim 0.9\delta_<0>+0.07N(0,0.04)+0.03N(0,0.16), $$0>where t 5,0.02 is a scaled t distribution with degree of freedom 5 and scaling factor 0.2. The distribution of the associated SNPs’ effect sizes follow the Gaussian mixture model in the second case. The average empirical biases and RMSE of all estimators in these two cases are shown in Tables 5 and 6, respectively. From the tables, we can see that our method is still better in terms of overcoming winner’s curse and providing higher estimation accuracy.
Table 7 The estimated hyperparameters π 0 and \(^>\) for 6 diseases of WTCCC datasetTable 8 Sample size of the replication study needed for 6 diseases of WTCCC dataset when average power is estimated by EB based method. The Control-to-Case ratio of the replication study is set to 1. The significance levels used in the primary study and the replication study are α 1=5×10 −8 and α 2=5×10 −6 , respectively
As a comparison, we also plot relationships between \(\bar ^\left (\widehat <\mu >^\right)\) and n (2) in Fig. 5, where \(\bar ^\left (\widehat <\mu >^\right)\) is the estimated average power by plugging in observed effect sizes. For a given sample size n (2) , the estimated average power value using EB based method is much smaller than \(\bar ^\left (\widehat <\mu >^\right)\) . This is reasonable because \(\bar ^\left (\widehat <\mu >^\right)\) is overestimated due to the winner’s curse, which is alleviated in the EB based method. To achieve 80 % average power of the replication study, the sample size needed is 3023 for coronary artery disease, 3369 for Crohn’s disease, 3788 for hypertension, 2748 for rheumatoid arthritis, 2706 for type 1 diabetes and 3095 for type 2 diabetes when using \(\bar ^\left (\widehat <\mu >^\right)\) as the estimator of average power. The sample sizes needed for other values of average power are listed in Table 9. These determined sample sizes are much smaller than the sample sizes determined by EB based method, indicating an underpowered study will be designed if we estimate power with observed effect sizes.
Table 9 Sample size of the replication study needed for 6 diseases of WTCCC dataset when average power is estimated by plugging in observed effect sizes. The Control-to-Case ratio of the replication study is set to 1. The significance levels used in the primary study and the replication study are α 1=5×10 −8 and α 2=5×10 −6 , respectively
For coronary artery disease and type 2 diabetes, we obtained the publicly available summary statistics of the meta-analysis from two consortiums: CARDIoGRAMplusC4D Consortium [22] and DIAGRAM Consortium [23], respectively. CARDIoGRAM GWAS is a meta-analysis of 22 GWAS studies of European descent involving 22,233 cases and 64,762 controls. The odds ratio calculated from high power CARDIoGRAM GWAS will be used as underlying true odds ratio to calculate the average power of the replication study for coronary artery disease in WTCCC. The average power obtained in this manner is denoted as \(\bar ^\left (\hat <\mu >_>\right)\) . Figure 5 a plots the relationship between \(\bar ^\left (\hat <\mu >_>\right)\) and n (2) , which is the sample size needed in the replication study. The figure shows that our EB based power estimator \(\bar _>^\) is very close to the power calculated using the results of CARDIoGRAM GWAS. Also it can be shown that \(\bar ^\left (\hat <\mu >_>\right)\) is in the credible interval we estimated. DIAGRAM GWAS is a meta-analysis consisting of 12,171 type 2 diabetes cases and 56,862 controls across 12 GWAS from European descent populations. Similar to CARDIoGRAM GWAS, the allele based odds ratio calculated from DIAGRAM GWAS is used for calculating the average power of the replication study for type 2 diabetes in WTCCC. Figure 5 f plots the relationship between \(\bar ^\left (\hat <\mu >_>\right)\) and n (2) . It can be shown that the result estimated by our EB based method \(\bar _>^\) is close to the power calculated using the results of DIAGRAM GWAS.
If the values of the local true discovery rates ltdr (1) have nearly the same level for all identified associations in the primary study, the variance of the average power will be inversely proportional to the number of the associations. When the identified number is small in the primary study, the credible interval for the average power is rather wide. This can be illustrated in the study of hypertension, where there is only 1 association showing genome-wide significance. From Fig. 5 c, we can see that the credible interval is rather wide. If we want to consider the credible interval for this situation, then the sample size can drastically increased.
We propose to design replication study under the case-control setting where log-odds ratio test is used. The method can also be generalized to other tests within z-test scheme, such as regression slope test used for quantitative trait.
As described in [7], the winner’s curse depends strongly on the power of primary study. For a high power primary study, most non-null SNPs will result in significant associations after random draws from the population. Hence, the bias will be small in this case. There are more and more high power studies conducted for common diseases by using pooling strategy or meta-analysis strategy, but the high power studies for rare diseases are limited. Hence, it is still helpful and necessary to propose a designing procedure for the replication study with the consideration of winner’s curse.
With the development of the cost-effective sequencing technique, the targets of association studies extend from common variations to rare variants. A commonly used strategy to discover associations with rare variants is the collapsing method [24], in which several rare variants in a certain group are pooled together to enrich the signal. For each group, a “super variant” is constructed. If log-odds ratio test is adopted in testing the association between “super variant” and the disease, our method can be used directly for designing the replication study.
Some limitations of our approach need to be mentioned.
Replication study is commonly used to verify findings discovered from GWAS. Power analysis is essential in designing a replication study. Traditional approach will not extract information from primary study. Also it will need users to specify a parameter μ min, which is subjective. Power estimation approach may address this problem, but there are several challenges in power estimation: the winner’s curse, credible interval and summarization.
In this paper, we propose an EB based power estimation method to resolve these challenges. Simulation experiments show our approach is better than other plug-in based approaches in terms of overcoming the winner’s curse and providing higher estimation accuracy. We also use simulation experiments to demonstrate the well calibration of the constructed credible interval. As an application example, we use our approach to determine the sample size needed in the WTCCC datasets of 6 diseases. Our approach gives an objective way to design replication study using information extracted from primary study.
The Bayes risk \(R(\widehat )\) can be derived as follows:
The last equality is hold by Fubini’s theorem.
From the last equality, it can be seen that the Bayesian predictive power η (2) is the minimizer of the expression in the brace for each value of z (1) . Hence η (2) is also the minimizer of \(R(\widehat )\) .
The following property of multivariate Gaussian distribution is proved in the Chapter 2 of [25], which can be used to derive η (2) .
If Z|μ ∼ N p(μ,Σ), and μ ∼ N p(μ 0,Σ 0), then
where W=Σ(Σ 0+Σ) −1
Because z (1) ∼ N(μ/σ (1) ,1) and \(\left (\mu \big | \mathcal _\right) \sim N\left (0, ^>\right)\) , the following can be obtained by using Property 1:
$$ \left(\mu \big| z^,\mathcal_\right)\sim N\left(\lambda \hat<\mu>^,\lambda \left(\sigma^\right)^\right), $$(1)>
where \(\lambda =\frac <1+\left (\sigma ^<(1)>/\sigma _\right)^>\) is a shrinkage effect factor. Under \(\mathcal _\) , the posterior distribution of Z (2) is
Then the Bayesian predictive power of the replication study reads:
$$ \eta^= \Phi\left(\fracwhere Φ(x) is the cumulative density function (cdf) of N(0,1).
By using Property 1, the marginal distribution of Z (1) is
$$ Z^\sim \pi_ N(0,1)+(1-\pi_) N\left(0, 1+\left(\frac<\sigma_><\sigma^>\right)^\right), $$(1)>which is a two-component Gaussian mixture model. Hence, the squared of Z (1) is distributed as
$$ \left(Z^\right)^\sim \pi_ ^>+(1-\pi_) \left(1+\left(\frac<\sigma_><\sigma^>\right)^\right)^>, $$(1)>where \(^>\) is the 1 degree of freedom χ 2 distribution. The expectation reads
$$ E\left(\left(Z^\right)^\right)=\pi_+(1-\pi_) \left(1+\left(\frac<\sigma_><\sigma^>\right)^\right). $$(1)>By summing over the test statistics of all SNPs, we can obtain
which introduce an estimator of \(^>\)
With Eq. (27), the local true discovery rate of the primary study reads:
where ϕ(x) is the probability density function (pdf) of N(0,1).
This paper was partially supported by a theme-based research project T12-402/13N of the Hong Kong Research Grant Council (RGC). The publication costs for this article were partly funded by the Hong Kong Research Grant Council, grant number T12-402/13N.