PlinyCompute: a platform for high-performance, distributed, data-intensive tool development

Files
1711.05573v2.pdf(1.03 MB)
Published version
Date
2017
DOI
Authors
Zou, Jia
Barnett, R. Matthew
Lorido-Botran, Tania
Luo, Shangyu
Monroy, Carlos
Sikdar, Sourav
Teymourian, Kia
Yuan, Binhang
Jermaine, Chris
Version
OA Version
Citation
Jia Zou, R Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine. 2017. "PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development.." CoRR, Volume abs/1711.05573,
Abstract
This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.
Description
License