Contemporary challenges in model misspecification

Date
2024
DOI
Version
OA Version
Citation
Abstract
In contemporary practical applications, the significance of model misspecification has grown notably in many fields. When statistical inferences are based on likelihoods, model misspecification can give rise to inaccurate uncertainty quantification, result in misleading inferences, and make interpretation difficult. Thus, it is important to be able to both (1) assess how a model fails in fitting the data and (2) mitigate the harm of model misspecification. To address the first challenge, we propose calibrated Bayesian model criticism methods we call split predictive checks (SPCs). SPCs combine the ease-of-use and speed of posterior predictive checks with good calibration and power properties. We introduce two variants of SPCs—single SPCs and divided SPCs—demonstrating their complementary strengths supported by asymptotic theory. To address the second challenge for the scenario where model misspecification leads to overfitting of mixture models, we develop a robust and structurally aware model selection criterion for determining the true number of clusters. We offer theoretical support by proving a consistency result under natural assumptions. Our criterion is validated through simulation studies and an application to flow cytometry data, consistently identifying the correct number of clusters.
Description
2024
License