Novel statistical methods for the assessment and development of polygenic scores in multi-ancestry cohorts

Lunetta, Kathryn L.Gunn, Sophia2025-03-272025-03-272024https://hdl.handle.net/2144/499572024Polygenic scores (PGS) have incredible potential to advance biological research and precision medicine. PGS estimate an individual’s genetic liability to traits or diseases and have been widely used to identify individuals at high risk of disease. However, there are limitations to current polygenic score development and applications. The most important limitation is that PGS performance often declines when they are applied to populations different from which they were derived. Most currently available PGS were developed using primarily European-ancestry populations, and when applied to populations underrepresented in genetic research, their performance is substantially worse. The evaluation of PGS is therefore challenging because performance of PGS can vary considerably across different populations. While methods have been proposed to build PGS using multi-ancestry data that can perform better in underrepresented populations, how to best develop PGS for multi-ancestry populations is still unknown. Further, while PGS can identify individuals at high risk of disease, they do not provide insight into the sources of genetic risk for these individuals. In this dissertation, we aim to offer guidance for navigating and addressing these limitations. Our second chapter introduces methodology to evaluate the performance of multiple polygenic scores in multiple populations using correlation-based tests. We show in simulations that our method has appropriate type 1 error and reasonable power at appropriate sample sizes. We then apply our methods to height and low-density lipoprotein cholesterol, providing two examples of how our methods can be used to analyze the performance of multiple polygenic scores in multiple populations. Our third chapter compares methods for building polygenic scores for multi-ancestry populations with GWAS from multiple populations. Using population-specific GWAS results from the Million Veterans Program, we build polygenic scores for five binary and five continuous traits using both ancestry-specific and multi-ancestry approaches and evaluate the scores in three populations in the All of Us (AoU) cohort. With the statistical framework introduced in the second chapter, we compare the various approaches and find multi-ancestry scores built with PRS-CSx outperform the other approaches in the three AoU populations. Our fourth chapter introduces a method for building pathway-specific polygenic scores (PPGS) with the Bayesian method PRS-CS. In simulations we demonstrate this method outperforms the previously proposed PPGS approach PRSet. We use our proposed methods to derive PPGS for AF in the UK Biobank and demonstrate the heterogeneity of genetic risk.en-USBiostatisticsGeneticsComplex traitsPolygenic scoresNovel statistical methods for the assessment and development of polygenic scores in multi-ancestry cohortsThesis/Dissertation2025-03-270000-0001-6451-050X