Multiple testing & optimization-based approaches with applications to genome-wide association studies
Posner, Daniel Charles
MetadataShow full item record
Many phenotypic traits are heritable, but the exact genetic causes are difficult to determine. A common approach for disentangling the different genetic factors is to conduct a "genome-wide association study" (GWAS), where each single nucleotide variant (SNV) is tested for association with a trait of interest. Many SNVs for complex traits have been found by GWAS, but to date they explain only a fraction of heritability of complex traits. In this dissertation, we propose novel optimization-based and multiple testing procedures for variant set tests. In the second chapter, we propose a novel variant set test, convex-optimized SKAT (cSKAT), that leverages multiple SNV annotations. The test generalizes SKAT to convex combinations of SKAT statistics constructed from functional genomic annotations. We differ from previous approaches by optimizing kernel weights with a multiple kernel learning algorithm. In cSKAT, the contribution of each variant to the overall statistic is a product of annotation values and kernel weights for annotation classes. We demonstrate the utility of our biologically-informed SNV weights in a rare-variant analysis of fasting glucose in the FHS. In the third chapter, we propose a sequential testing procedure for GWAS that joins tests of single SNVs and groups of SNVs (SNV-sets) with common biological function. The proposed procedure differs from previous procedures by testing genes and sliding 4kb intergenic windows rather than chromosomes or the whole genome. We also sharpen an existing tree-based multiple testing correction by incorporating correlation between SNVs, which is present in any SNV-set containing contiguous regions (such as genes). In the fourth chapter, we present a sequential testing procedure for SNV-sets that incorporates correlation between test statistics of the SNV-sets. At each step of the procedure, the multiplicity correction is the number of remaining independent tests, making no assumption about the null distribution of tests. We provide an estimator for the number of remaining independent tests based on previous work in single-SNV GWAS and demonstrate the estimator is valid for sequential procedures. We implement the proposed method for GWAS by sequentially testing chromosomes, genes, 4kb windows, and SNVs.
RightsAttribution 4.0 International