Estimating smooth GLM in non-interactive local differential privacy model with public unlabeled data
Files
Accepted manuscript
Date
DOI
Authors
Wang, Di
Zhang, Huanyu
Gaboardi, Marco
Xu, Jinhui
Version
OA Version
Citation
D. Wang, H. Zhang, M. Gaboardi, J. Xu. "Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data." Proceedings of Machine Learning Research. International Conference on Algorithmic Learning Theory
Abstract
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. Firstly, motived by Stein’s lemma, we show that if each data record is i.i.d. sampled from zero-mean Gaussian distribution, we show that there exists an (𝜖, 𝛿)-NLDP algorithm for GLM. The sample complexity of the public and private data, for the algorithm to achieve an 𝛼 estimation error (in 𝓁_2-norm) with high probability, is O(p𝛼^-2) and O(p^3𝛼^-2𝜖^-2), respectively. This is a significant improvement over the previously known exponential or quasi-polynomial in 𝛼^-1, or exponential in p sample complexity of GLM with no public data. Then, by a variant of Stein’s lemma, we show that there is an (𝜖, 𝛿)-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded 𝓁_1-norm. Then the sample complexity of the public and private data, for the algorithm to achieve an estimation error (in 𝓁∞-norm) with high probability, is O(p^2𝛼^-2) and O(p^2𝛼^-2𝜖^-2), respectively, if is not too small (i.e., 𝛼 ≥ Ω (1/√p )), where p is the dimensionality of the data. We also extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.
Description
License
© 2021 D. Wang, L. Hu, H. Zhang, M. Gaboardi & J. Xu. This preprint is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).