Guaranteed validity for empirical approaches to adaptive data analysis
Files
Published version
Date
2020-08-15
DOI
Authors
Rogers, Ryan
Roth, Aaron
Smith, Adam
Srebro, Nathan
Thakkar, Om
Woodworth, Blake
Version
OA Version
Citation
R. Rogers, A. Roth, A. Smith, N. Srebro, O. Thakkar, B. Woodworth. 2020. "Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis." 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020
Abstract
We design a general framework for answering
adaptive statistical queries that focuses on
providing explicit confidence intervals along
with point estimates. Prior work in this area
has either focused on providing tight confidence
intervals for specific analyses, or providing
general worst-case bounds for point estimates.
Unfortunately, as we observe, these
worst-case bounds are loose in many settings
— often not even beating simple baselines like
sample splitting. Our main contribution is
to design a framework for providing valid,
instance-specific confidence intervals for point
estimates that can be generated by heuristics.
When paired with good heuristics, this
method gives guarantees that are orders of
magnitude better than the best worst-case
bounds. We provide a Python library implementing
our method.
Description
License
Copyright 2020 by the author(s).