Auditing black-box prediction models for data minimization compliance

Rastegarpanah, Bashir; Gummadi, Krishna P.; Crovella, Mark

Auditing black-box prediction models for data minimization compliance

Files

minimization-audit-Neurips21.pdf(459.09 KB)

Published version

Date

2021

Authors

Rastegarpanah, Bashir

Gummadi, Krishna P.

Crovella, Mark

Version

Published version

URI

https://hdl.handle.net/2144/44883

Citation

B. Rastegarpanah, K. Gummadi, M. Crovella. 2021. "Auditing Black-Box Prediction Models for Data Minimization Compliance." NeurIPS, https://proceedings.neurips.cc/paper/2021/hash/ac6b3cce8c74b2e23688c3e45532e2a7-Abstract.html

Abstract

In this paper, we focus on auditing black-box prediction models for compliance with the GDPR’s data minimization principle. This principle restricts prediction models to use the minimal information that is necessary for performing the task at hand. Given the challenge of the black-box setting, our key idea is to check if each of the prediction model’s input features is individually necessary by assigning it some constant value (i.e., applying a simple imputation) across all prediction instances, and measuring the extent to which the model outcomes would change. We introduce a metric for data minimization that is based on model instability under simple imputations. We extend the applicability of this metric from a finite sample model to a distributional setting by introducing a probabilistic data minimization guarantee, which we derive using a Bayesian approach. Furthermore, we address the auditing problem under a constraint on the number of queries to the prediction system. We formulate the problem of allocating a budget of system queries to feasible simple imputations (for investigating model instability) as a multi-armed bandit framework with probabilistic success metrics. We define two bandit problems for providing a probabilistic data minimization guarantee at a given confidence level: a decision problem given a data minimization level, and a measurement problem given a fixed query budget. We design efficient algorithms for these auditing problems using novel exploration strategies that expand classical bandit strategies. Our experiments with real-world prediction systems show that our auditing algorithms significantly outperform simpler benchmarks in both measurement and decision problems.

Collections

BU Open Access Articles
CAS: Computer Science: Scholarly Papers

Full item page