DEMO: integrating MPC in big data workflows
Files
Accepted manuscript
Date
2016
DOI
Authors
Volgushev, Nikolaj
Schwarzkopf, Malte
Lapets, Andrei
Varia, Mayank
Bestavros, Azer
Version
Accepted manuscript
OA Version
Citation
Nikolaj Volgushev, Malte Schwarzkopf, Andrei Lapets, Mayank Varia, Azer Bestavros. 2016. "DEMO: Integrating MPC in Big Data Workflows.." IACR Cryptology ePrint Archive, Volume 2016, pp. 883 - 883.
Abstract
Secure multi-party computation (MPC) allows multiple parties to
perform a joint computation without disclosing their private inputs. Many real-world joint computation use cases, however, involve data analyses on very large data sets, and are implemented by
software engineers who lack MPC knowledge. Moreover, the collaborating parties – e.g., several companies – often deploy different
data analytics stacks internally. These restrictions hamper the realworld usability of MPC. To address these challenges, we combine
existing MPC frameworks with data-parallel analytics frameworks
by extending the Musketeer big data workflow manager [4]. Musketeer automatically generates code for both the sensitive parts of a
workflow, which are executed in MPC, and the remaining portions
of the computation, which run on scalable, widely-deployed analytics systems. In a prototype use case, we compute the HerfindahlHirschman Index (HHI), an index of market concentration used
in antitrust regulation, on an aggregate 156 GB of taxi trip data
over five transportation companies. Our implementation computes
the HHI in about 20 minutes using a combination of Hadoop and
VIFF [1], while even “mixed mode” MPC with VIFF alone would
have taken many hours. Finally, we discuss future research questions that we seek to address using our approach.
Description
License
Copyright 2016 held by the owner/author(s).