Sebastiani, PaolaSong, Zeyuan2025-02-122024https://hdl.handle.net/2144/497792024This dissertation focuses on the development of advanced multivariate analysis methods for the analysis of genetics and genomics data with multiple sources of correlations. The dissertation describes three novel topics: (1) a method to learn partial correlation networks, also known as Gaussian Graphical Models, to analyze multi-omics data (2) a sparse network method to reduce network complexity, and (3) a Genome-Wide Association Study pipeline to analyze genome-wide genotype data in longitudinal and familial settings. In the first part of my dissertation I propose a cluster-based Bootstrap algorithm for learning Gaussian Graphical Models from correlated data. The Bootstrap algorithm is validated to effectively control Type I errors without compromising statistical power compared to alternative solutions through extensive simulations in family-based studies. Additionally the algorithm is applied to learn the partial correlation networks of 47 Polygenic Risk Scores generated from genome-wide genotype data in the Long Life Family Study to unveil the complex relationships of these Polygenic Risk Scores. The second part of the dissertation extends the Bootstrap algorithm to learn sparse Gaussian Graphical Models in correlated data. Simulation studies shows that this extended Bootstrap algorithm maintains control over the Type I errors. By varying the values of the tuning parameter, the dynamic changes of networks reveal their contraction and dissection as edges with small partial correlations are systematically removed. The application of this method in real data analysis identifies meaningful clusters in the dynamic changes of the Polygenic Risk Scores and lipids networks. In the third part, I developed a Nextflow Genome-Wide Association Study pipeline, providing a fully automated analysis tool for managing, analyzing, and visualizing genome-wide genotype data for continuous and binary traits with correlated genetics data. Applying this pipeline to investigate processing speed in the Long Life Family Study leads to the identification of 17 rare protective Single Nucleotide Polymorphisms located in/near Retinoic Acid Receptor Beta and Thyroid Hormone Receptor Beta genes on chromosome 3. These findings shed light on potential mechanisms supporting the preservation of processing speed in aging individuals.en-USBiostatisticsBootstrapCorrelated dataGaussian Graphical ModelsGeneticsGenomicsMulti-omics data integrationRobust methods for multivariate analysis of correlated genetics and genomics dataThesis/Dissertation2025-02-110000-0002-7352-4177