Some methods improvement and extension to exposure-response estimation with family data

Date
2023
DOI
Version
Embargo Date
2027-02-04
OA Version
Citation
Abstract
Ignoring correlation among observations can lead to inaccurate inference, making it essential to develop methods to analyze correlated data correctly. However, existing methods are limited and cannot answer specific questions when the data are correlated, or certain model assumptions are unmet. To address these issues, this thesis extends and improves three existing methods to handle correlated observations, specifically those arising from family data. The objective is to provide analytical tools that can handle the complexities of familial correlations, making these methods more effective and applicable in real-world scenarios. We first extend Bayesian factor analysis for family data to account for family structure. Our method can estimate the covariance matrix and select batches of correlated predictors simultaneously while accounting for family structure in the model. We demonstrate its effectiveness through simulation studies and real data analysis. Our method outperforms existing methods in covariance matrix estimation, regression coefficient estimation, and variable selection, particularly in high-dimensional situations. We also apply our method to a real dataset and show that it successfully deciphers the true association among a group of correlated metabolites. Then we propose a new method referred as BKMR-MHMC to improve the computational efficiency and capacity of Bayesian kernel machine regression (BKMR) to estimate non-linear exposure-response functions and perform variable selection. We also modify hierarchical variable selection using mixed Hamiltonian Monte Carlo (M-HMC) to handle with highly correlated predictors. By introducing a random effect, BKMR-MHMC can accommodate complex correlation structures like family structures. We show through simulation and real data analyses that the proposed BKMR-MHMC method outperforms the original BKMR method and its speed-up version in convergence speed and accuracy for high-dimensional data, in the ability to incorporate highly correlated predictors and in modeling complex correlation structures. Finally, we extend the generalized additive mixed model to handle family data with repeated measurements. We extend the gamm4 R package for incorporating family data using re-parameterization and transformation. We evaluate the effectiveness of our proposed approach on intraclass correlation coefficient (ICC) estimation and prediction accuracy through simulations under various scenarios, as well as on a real dataset from the Framingham Heart Study. Our proposed approach can accurately estimate ICC, particularly for dense family structures, and the prediction accuracy surpasses the original gamm4 method, particularly for a large number of subjects.
Description
2023
License