Applications of big data approaches to topics in infectious disease epidemiology
Benedum, Corey Michael
MetadataShow full item record
The availability of big data (i.e., a large number of observations and variables per observation) and advancements in statistical methods present numerous exciting opportunities and challenges in infectious disease epidemiology. The studies in this dissertation address questions regarding the epidemiology of dengue and sepsis by applying big data and traditional epidemiologic approaches. In doing so, we aim to advance our understanding of both diseases and to critically evaluate traditional and novel methods to understand how these approaches can be leveraged to improve epidemiologic research. In the first study, we examined the ability of machine learning and regression modeling approaches to predict dengue occurrence in three endemic locations. When we utilized models with historical surveillance, population, and weather data, machine learning models predicted weekly case counts more accurately than regression models. When we removed surveillance data, regression models were more accurate. Furthermore, machine learning models were able to accurately forecast the onset and duration of dengue outbreaks up to 12 weeks in advance without using surveillance data. This study highlighted potential benefits that machine learning models could bring to a dengue early warning system. The second study utilized machine learning approaches to identify the rainfall conditions which lead to mosquito larvae being washed away from breeding sites occurring in roadside storm drains in Singapore. We then used conventional epidemiologic approaches to evaluate how the occurrence of these washout events affect dengue occurrence in subsequent weeks. This study demonstrated an inverse relationship between washout events and dengue outbreak risk. The third study compared algorithmic-based and conventional epidemiologic approaches used to evaluate variables for statistical adjustment. We used these approaches to identify what variables to adjust for when estimating the effect of autoimmune disease on 30-day mortality among ICU patients with sepsis. In this study, autoimmune disease presence was associated with an approximate 10-20% reduction in mortality risk. Risk estimates identified with algorithmic-based approaches were compatible with conventional approaches and did not differ by more than 9%. This study revealed that algorithmic-based approaches can approximate conventional selection methods, and may be useful when the appropriate set of variables to adjust for is unknown.
RightsAttribution 4.0 International