Robust and reproducible model selection using bagged posteriors
Files
First author draft
Date
2020-07
DOI
Authors
Huggins, Jonathan H.
Miller, Jeffrey W.
Version
First author draft
OA Version
Citation
Jonathan H Huggins, Jeffrey W Miller. 2020. "Robust and Reproducible Model Selection Using Bagged Posteriors." arXiv.org, Volume arXiv:2007.14845 [stat.ME], https://arxiv.org/abs/2007.14845.
Abstract
Bayesian model selection is premised on the assumption that the data are
generated from one of the postulated models, however, in many applications, all of
these models are incorrect. When two or more models provide a nearly equally good
t to the data, Bayesian model selection can be highly unstable, potentially leading
to self-contradictory ndings. In this paper, we explore using bagging on the posterior
distribution (\BayesBag") when performing model selection { that is, averaging the
posterior model probabilities over many bootstrapped datasets. We provide theoreti-
cal results characterizing the asymptotic behavior of the standard posterior and the
BayesBag posterior under misspeci cation, in the model selection setting. We empir-
ically assess the BayesBag approach on synthetic and real-world data in (i) feature
selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory
and experiments show that in the presence of misspeci cation, BayesBag provides
(a) greater reproducibility and (b) greater accuracy in selecting the correct model,
compared to the standard Bayesian posterior; on the other hand, under correct speci-
cation, BayesBag is slightly more conservative than the standard posterior. Overall,
our results demonstrate that BayesBag provides an easy-to-use and widely applicable
approach that improves upon standard Bayesian model selection by making it more
stable and reproducible.