Guaranteed validity for empirical approaches to adaptive data analysis

Rogers, Ryan; Roth, Aaron; Smith, Adam; Srebro, Nathan; Thakkar, Om; Woodworth, Blake

Guaranteed validity for empirical approaches to adaptive data analysis

Files

rogers20a.pdf(635.06 KB)

Published version

Date

2020-08-15

Authors

Rogers, Ryan

Roth, Aaron

Smith, Adam

Srebro, Nathan

Thakkar, Om

Woodworth, Blake

URI

https://hdl.handle.net/2144/43264

Citation

R. Rogers, A. Roth, A. Smith, N. Srebro, O. Thakkar, B. Woodworth. 2020. "Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis." 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020

Abstract

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings — often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

License

Collections

BU Open Access Articles
CAS: Computer Science: Scholarly Papers

Full item page