Evaluating multiple imputation methods for longitudinal healthy aging index - a score variable with data missing due to death, dropout and several missing data mechanisms
MetadataShow full item record
The healthy aging index (HAI) is a score variable based on five clinical components. I assess how well it predicts mortality in a sample of older adults from the Framingham Heart Study (FHS). Over 30% of FHS participants have missing HAI across time; I investigate how well imputation methods perform in this setting. I run simulations to compare four methods of multiple imputation (MI) by fully conditional specification (FCS) and the complete case (CC) approach on estimation of means, correlations, and slopes of the HAI over time. I simulate multivariate normal data for each component of HAI at four time points, along with age and sex, using within and across-time correlation patterns at the percent of missing data seen in observed FHS data. My methods of MI are cross-sectional FCS (XFCS, imputation model uses other components at same time), longitudinal FCS (LFCS, uses same component at all times ignoring cross-component correlation), all FCS (AFCS, uses all components at all times) and 2-fold FCS (2fFCS, uses all components at current and adjacent times). I compare percent bias, confidence interval width, coverage probability and relative efficiency for three mechanisms of missing data (MCAR,MAR,MNAR), two sample sizes (n=1000,100), and two numbers of imputed datasets (m=5,20). All longitudinal methods (not XFCS) yield nearly identical results with unbiased estimates of means, correlations and slopes. Increase in precision and relative efficiency is small when augmenting from 5 to 20 imputations. Finally, I compare the imputation methods and CC analysis in survival models using HAI as a time-dependent variable to predict mortality. I simulate HAI data as described above, time-to-death using piece-wise exponential models, and I impose type I and random censoring on 32% of observations. CC analysis reduces sample size by 10%, produces unbiased estimates, but inflates standard errors. The three longitudinal imputation methods introduce minimal bias (<5%) in the hazard ratio estimates, while reducing the standard error up to 10% compared with CC. Overall, I show that multiple imputation using longitudinal methods is beneficial in the setting of repeated measurements of a score variable. It works well in analyzing changes over time and in time-dependent survival analyses.