Enhancing healthcare decisions using big data
OA Version
Citation
Abstract
This dissertation supports US healthcare decisions by analyzing big data. The first chapter contains two papers that document and predict 10-year school opening, closing, and vacation dates and explore the transmission of common respiratory infectious diseases within families. The second chapter generates synthetic indexes of Medicaid dental benefit generosity for low-income older adults to improve their dental access. The third chapter evaluates the performance of the Diagnosis Cost Group(DCG) clustering algorithm combined with mixed payment formulas, outliers, and reinsurance strategies on total spending. The first paper of the first chapter, joint with Ying Liu, Yuhan Chen, Sarah Ma, Maxim Slobodchikov, and Randall Ellis, first describes our use of web sources to collect school opening, closing, and vacation dates (together with federal and state holidays) from 2010/11 through 2020/21 school years for a large national sample of school districts. We then use this information in an original framework to predict days in public school not only for the 2744 school district-year pairs for which we were able to obtain actual school calendar schedules but also for the remaining 1931 school district-year pairs for which we need to impute dates, including primarily earlier years with missing data. Our final results are balanced panels of ten academic years of “predicted days in public school” by week and month for 480 school districts chosen to include all MSA+ and rural areas in the US and all two-digit zip codes.
With the results of the first paper, the second paper in the first chapter studies influenza, pneumonia, and respiratory infection to uncover intertemporal, within-family, and across-age cohort infection patterns to deepen our knowledge of factors affecting the transmission of infectious diseases with properties similar to COVID-19. I combine patient information and diagnoses from the Merative® (formerly IBM) MarketScan® Commercial Database between July 1. 2010, and June 30, 2019, with MSA-level weekly school data previously collected by the author with coauthors documenting school opening and closing dates over the same pre-pandemic period. I use linear probability models, including weather and other MSA-level control variables, on a sample of 122,487,230 individuals and their weekly diagnostic data. I find that within-family infection rates of pneumonia, influenza, and respiratory infection, especially high school students’ infection rates, rise as the number of days schools are open. Infected primary and high school students are the main introducers of pneumonia, influenza, and respiratory infections.
The second chapter, with Astha Singhal, develops an unbiased synthetic measure of Medicaid dental benefits generosity using the 2019 MarketScan insurance dental dataset as a nationally representative sample to improve the accessibility of dental services for low-income older adults. As a measure of coverage generosity, we quantify the proportion of dental procedures each state’s Medicaid dental policy can cover. As an alternative measure, we also consider the proportion of older adults with dental payments under the Annual Maximum Limit (AML) after excluding certain dental services. Results show that the most common dental services are X-rays, exams, and cleanings, and the most expensive dental procedures are implants, root canals, and bridges. The generosity of payment can help policymakers with limited budgets decide on the state’s AML. Our results indicate that with AML lower than $500, excluding the three most common dental services from AML can cover the highest proportion of older adults’ payments; with AML higher than $500, excluding the three most expensive dental services from AML can maximize the payment generosity.
The third chapter, with Corinne Andriola and Randall Ellis, examines the performance of the Diagnostic Cost Group's (DCG) framework when combined with mixed payment formulas, outliers, and reinsurance strategies to evaluate its performance relative to the existing payment formulas. These additional models are essential for understanding whether the DCG machine learning algorithm developed by Andriola et al (2024) achieves the same high performance for other outcomes as it did in the one base case model examined. The paper also explores using the DCG algorithm to re-estimate the twelve additional health spending and health assessment outcomes examined in Ellis et al. (2022) using the additive DXI model. The findings suggest that while DCG may have a slightly lower level of accuracy compared to DXI clustering, it demonstrates a similar level of predictive performance as CCSR, which surpasses that of HCC. The incorporation of ML algorithms renders DCG the simplest among various risk-adjustment models. By incorporating outlier, reinsurance, and mixed-payment strategies, DCG effectively prevents underpayments for rare diseases while maintaining accuracy and improving healthcare cost projections in the social insurance system.
Description
2024