Bayesian methods for multivariate modeling of pleiotropic single-nucleotide polymorphisms and genetic risk prediction
Date
2012
DOI
Authors
Hartley, Stephen William
Version
Embargo Date
Indefinite,Indefinite
OA Version
Citation
Abstract
Genome-wide association studies (GW AS) have identified numerous associations between genetic loci and individual phenotypes; however, relatively few GWAS have attempted to detect pleiotropic associations, in which loci are simultaneously associated with multiple distinct phenotypes.
In this thesis, we show that pleiotropic single nucleotide polymorphism (SNP) associations can be directly modeled via the construction of simple Bayesian networks, and that these models can be applied to produce Bayesian classifiers that leverage pleiotropy to improve genetic risk prediction. We then demonstrate the effectiveness ofthese methods in both simulated and real data.
The proposed method includes two phases: first, SNPs are fitted to models and ranked by the strength of evidence of association; second, the final feature set and classification rule is selected using cross validation prediction. The final classifiers can then be used to test the validity of the candidate genes as well as for diagnostic and prognostic purposes. Multiple genetic risk prediction methods were developed and tested. Multiple phenotypes can be predicted jointly, or alternatively, a phenotype of interest can be predicted, either conditionally given known secondary phenotype status, or marginally across unknown secondary phenotype statuses. Furthermore, prediction can be carried out using either single classifiers or ensembles of classifiers.
To demonstrate the capabilities and limitations ofthese methods, several hundred GWAS were simulated under various effect strengths, sample sizes, and phenotype distributions. Multiple prediction methods, search algorithms, and optimization loss functions were tested and compared.
Next, we applied these methods on the cooperative study of sickle cell disease (CSSCD) dataset, examining the genetic basis for cerebrovascular accident (CVA) and fetal hemoglobin level (HbF). To demonstrate the effectiveness ofthe model selection and classification, CVA status was predicted in validation datasets from several other studies.
The model search and classification methods described in this thesis are capable of efficient pleiotropic locus identification and phenotype classification under a variety of conditions. These methods are robust and computationally efficient, providing a powerful new approach for detecting and modeling pleiotropic disease loci.
Description
Thesis (Ph.D.)--Boston University
PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.