Meta-analysis strategies for heterogeneous studies in genome-wide association studies
MetadataShow full item record
Meta-analysis is a statistical technique that combines results from multiple independent studies to make inferences about parameters of interest. Although it is popular for parameter estimation and hypothesis testing, meta-analytic approaches that incorporate heterogeneous studies have not been fully developed. For heterogeneous studies, we do not expect all of the studies to have the same true underlying effect and the use of the fixed-effects model in a meta-analysis in this situation violates the assumption of homogeneity of effect size. Heterogeneity among studies can arise from multiple sources such as differences in populations by ancestry, differences in study designs, and different impacts of environmental exposures on the effect of the variable of interest. In this thesis, we introduce an analytic strategy and statistical models for meta-analysis of potentially heterogeneous studies. First, we propose a two-stage clustering approach to account for heterogeneity in trans-ethnic meta-analysis of genome-wide association studies (GWAS). Specifically, we cluster studies in the two-stage approach using cohort-specific genetic information prior to meta-analysis to account for between-cluster heterogeneity as well as to bolster within-cluster homogeneity. An extensive simulation study shows that this approach improves power and diminishes computational intensity compared to existing methods for trans-ethnic meta-analysis. Next, under a meta-regression framework, we develop a likelihood ratio test (LRT) statistic to accommodate multiple random effects. We allow multiple sources of heterogeneity in terms of study characteristics and model the heterogeneities as random effects. We show that the proposed LRT maintains a similar or higher power than other existing methods in a simulation study especially when heterogeneity exists. We apply this new approach to meta-analyze genome-wide association data. Lastly, we derive a score test in the same context as our proposed new LRT and show the substantial advantage of the score test in computational efficiency compared to the new LRT. The introduced strategy and methodologies can effectively and efficiently aggregate the evidence from potentially heterogeneous studies in statistical genetics and other research areas.