Show simple item record

dc.contributor.advisorYang, Qiongen_US
dc.contributor.authorLi, Shuoen_US
dc.date.accessioned2019-12-16T20:12:06Z
dc.date.available2019-12-16T20:12:06Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/2144/38977
dc.description.abstractCorrelation is commonly present in genetic association studies and may yield incorrect inference when ignored. Hence, developing methods for properly analyzing correlated data is crucial. However, there is a lack of analytical tools to answer certain questions because existing methods are not applicable when some model assumptions are violated. In this thesis, we propose three methods for correlated phenotypes, particularly correlation arising from family data. We first develop an iterated weighted linear mixed effects (IWLME) method to account for heteroscedasticity. We compare the model performance of IWLME with five other methods by simulation studies. When applying methods that ignore heteroscedasticity, the occurrence of heteroscedasticity results in lower power, but not excessive type I error. When heteroscedasticity is present, meta-analysis, linear mixed effects (LME) models in GENetic EStimation and Inference in Structured samples (GENESIS), weighed LME and IWLME provide a more precise estimate of the effect size with smaller bias and mean square error, compared with LME and generalized estimating equations (GEE). In an Epi-genome wide association study, by applying IWLME, more CpGs reach the significance threshold compared with LME. We then explore R2 statistics in LME, defining R2 as the proportion of the variance in the response that is predictable from the fixed effect variables. We review six existing R2 estimators and extend these estimators to estimate partial R2. We propose three R2/partial R2 estimators based on our R2 definition and variance decomposition. We compare the performance among the methods by simulation studies. Our proposed R2 estimators have the smallest mean square error, low bias, and no or only a small percentage of negative estimation when the true R2/partial R2 is modest or higher (>2%). Finally, a Firth bias corrected generalized estimating equations (FBC-GEE) approach is proposed to address separation for correlated binary data, a common occurrence in association analyses of rare genetic variants. We compare GEE, FBC-GEE, Firth logistic regression and Scalable and Accurate Implementation of GEneralized mixed model (SAIGE) by conducting simulation studies. FBC-GEE helps reduce type I error inflation compared with GEE. With these projects, we develop new methodologies and improve the understanding of the performance of available methods for genetics studies with family data.en_US
dc.language.isoen_US
dc.subjectBiostatisticsen_US
dc.titleMethods for correlated observations with applications to genetic association studiesen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2019-11-13T20:01:29Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineBiostatisticsen_US
etd.degree.grantorBoston Universityen_US
dc.identifier.orcid0000-0003-2331-2448


This item appears in the following Collection(s)

Show simple item record