From soft classifiers to hard decisions

Date Issued
2019Publisher Version
10.1145/3287560.3287561Author(s)
Canetti, Ran
Cohen, Aloni
Dikkala, Nishanth
Ramnarayan, Govind
Scheffler, Sarah
Smith, Adam
Metadata
Show full item recordPermanent Link
https://hdl.handle.net/2144/40972Version
First author draft
Citation (published version)
Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith. 2019. "From Soft Classifiers to Hard Decisions." Proceedings of the Conference on Fairness, Accountability, and Transparency - FAT* '19, https://doi.org/10.1145/3287560.3287561Abstract
A popular methodology for building binary decision-making classifiers in the presence of imperfect information is to first construct a calibrated non-binary "scoring" classifier, and then to post-process this score to obtain a binary decision. We study various fairness (or, error-balance) properties of this methodology, when the non-binary scores are calibrated over all protected groups, and with a variety of post-processing algorithms. Specifically, we show:
First, there does not exist a general way to post-process a calibrated classifier to equalize protected groups' positive or negative predictive value (PPV or NPV). For certain "nice" calibrated classifiers, either PPV or NPV can be equalized when the post-processor uses different thresholds across protected groups. Still, when the post-processing consists of a single global threshold across all groups, natural fairness properties, such as equalizing PPV in a nontrivial way, do not hold even for "nice" classifiers.
Second, when the post-processing stage is allowed to defer on some decisions (that is, to avoid making a decision by handing off some examples to a separate process), then for the non-deferred decisions, the resulting classifier can be made to equalize PPV, NPV, false positive rate (FPR) and false negative rate (FNR) across the protected groups. This suggests a way to partially evade the impossibility results of Chouldechova and Kleinberg et al., which preclude equalizing all of these measures simultaneously. We also present different deferring strategies and show how they affect the fairness properties of the overall system.
We evaluate our post-processing techniques using the COMPAS data set from 2016.
Collections