Association analysis and clustering of rare variants with disease phenotypes

OA Version
Citation
Abstract
Hundreds of thousands of human deoxyribonucleic acid (DNA) samples have been whole genome sequenced, identifying numerous rare variants in the nuclear genome and mitochondrial genome (mtDNA). Multiple mtDNA molecules are present in a cell. mtDNA heteroplasmy is the presence of two or more nucleotides at an mtDNA location in the same individual. Most of the heteroplasmic variants are extremely rare, posing a challenge to applying traditional analytic approaches in association with heteroplasmy. On the other hand, clustering disease-associated rare variants (e.g., classify them into null, positively, or negatively associated groups) in a gene region provides useful information for investigating the underlying biological mechanisms between rare variants and disease traits. However, few studies have investigated rare variants clustering. To fill in these knowledge gaps, this dissertation focuses on association analysis and clustering of rare variants. In project 1, we develop and evaluate a comprehensive framework for association testing of heteroplasmy using both simulated and real data. In project 2, we propose a method to cluster trait-associated rare variants based on a Gaussian mixture model (GMM) and apply this method to a real dataset. We also assess the effect of linkage disequilibrium (LD) on the performance of the clustering method in simulation studies. In project 3, we apply the framework developed in project 1 for association analysis of heteroplasmy to cardiometabolic diseases (CMDs) in six TOPMed cohorts to identify CMD-associated heteroplasmic gene regions. Knowledge gained from these three projects will help to better understand the role of rare genetic variants in the etiology of complex human diseases.
Description
2023
License