Methods for size estimation of hidden population using large-scale health data

Wang, Jianing

Methods for size estimation of hidden population using large-scale health data

Files

Wang_bu_0017E_18354.pdf(5.17 MB)

Date

2023

Authors

Wang, Jianing

Embargo Date

2027-02-04

URI

https://hdl.handle.net/2144/49713

Abstract

Accurately estimating hidden population sizes is essential for effective policy-making, but a traditional census is typically not feasible. Data-driven approaches that use existing sources are needed for reliable prevalence estimates. Capture-recapture methods, used in ecology to estimate population size, have been advanced in epidemiology over the past two decades to improve disease prevalence estimates and can relax unrealistic assumptions of naive models. Yet given the present development within epidemiology, difficulties still exist in using conventional capture-recapture methods to estimate prevalence in subpopulations across spatial units. Additionally, there is a need to estimate prevalence in socioeconomically stratified groups to understand the groups with higher risks and discover potential health disparities. Unfortunately, conventional approaches heavily rely on stand-alone stratified analysis, which may be less effective when certain subgroups display similar or more intricate patterns of correlated healthcare engagement. To address these challenges, we first articulated the fundamental concepts behind the capture-recapture method by comparing it to a similar approach, the multiplier benchmark method. Then we focused on the capture-recapture structure and proposed a Bayesian hierarchical spatial capture-recapture model that estimates individual detection probabilities and spatial variation of OUD prevalence. The proposed model enables population structure estimation from coarse summaries to finer-scale components using a spatially explicit areal adjacency-based smoothing process model. Finally, an extension of the proposed model is presented to incorporate the correlation structure between socioeconomically stratified subpopulations. We applied the extended model to the Massachusetts Public Health Data Warehouse to evaluate the efficiency of the proposed method compared to traditional methods. We used simulation studies for each work to investigate the performance of the proposed estimators in varying circumstances and to determine which method may be more effective in different scenarios. Our comprehensive evaluation found that the proposed methods could accurately estimate area-specific and group-specific prevalence with lower bias and variance. These methods effectively address the issue of data sparsity in subpopulations and account for the underlying structure that more accurately reflects what occurs during ecological and data collection processes. The methods developed in this dissertation provide a powerful tool for accurately estimating the disease burden in hidden subpopulations, making them essential for targeted interventions and effective public health policies.

Description

2023

License

Attribution-NonCommercial-NoDerivatives 4.0 International

cbn

Collections

Boston University Theses & Dissertations

Full item page