Identifying genetic variants associated with multiple correlated traits and the use of an ensemble of genetic risk models for phenotype prediction and classification
Milton, Jacqueline Nicole
MetadataShow full item record
Sickle cell disease is a monogenic blood disorder in which the clinical course and disease severity vary widely among patients. In order for physicians to make more informed decisions regarding the treatment and management of disease, it would be useful to be able to predict disease severity. We focus on two primary modulators of disease severity in sickle cell patients, hemolysis and fetal hemoglobin (HbF). This dissertation evaluates methodology to identify genetic variants associated with severity of sickle cell disease and develops new methodology of genetic risk prediction to predict disease severity in sickle cell patients based on levels of HbF. Hemolysis is a trait that is influenced by multiple correlated phenotypes (lactate dehydrogenase, reticulocytes, bilirubin and aspartate transaminase). There are several approaches to statistical analyses of multiple correlated phenotypes. The first part of this dissertation evaluates the use of principal component analysis (PCA) and compares it to the alternative approach of examining the results of multiple univariate phenotypes individually. We will focus on the question of if and under what conditions we gain more power using a summarized phenotype from PCA in a genome wide association study (GWAS) rather than conducting multiple individual GWAS. We find that the there is more power gained from the PCA approach when there is a strong intercorrelation between the phenotypes. The second part of this dissertation proposes a novel method of genetic risk prediction for continuous traits using an ensemble of genetic models. We aim to show through a simulation and prediction of HbF that the proposed method is more robust to the inclusion of false positives and yields more stable predictions than computing a GRS and 10 fold cross validation. The third part of this dissertation introduces a Bayesian-based clustering approach to produce clusters of sickle cell anemia patients based on their "predicted genetic profiles" of HbF. We then examine the genetic profiles of individuals in the extreme clusters to determine which genes contribute more prominently to the genetic profile so that we may potentially identify genes that are highly influential in the regulation of extremely high and low values of HbF.