CAPSys: Contention-aware task placement for data stream processing
Date
2024-09
Authors
Wang, Yuanli
Huang, Lei
Wang, Zikun
Kalavri, Vasiliki
Matta, Ibrahim
Version
OA Version
Citation
Abstract
In the context of streaming dataflow queries, the task placement problem aims to identify a mapping of operator tasks to physical resources in a distributed cluster. We show that task placement not only significantly affects query performance but also the convergence and accuracy of auto-scaling controllers. We propose CAPSys, an adaptive resource controller for dataflow stream processors, that considers auto-scaling and task placement in concert. CAPSys relies on Contention-Aware Placement Search (CAPS), a new placement strategy that ensures compute-intensive, I/O-intensive, and network-intensive tasks are balanced across available resources.
We integrate CAPSys with Apache Flink and show that it consistently achieves higher throughput and lower backpressure than Flink’s strategies, while it also improves the convergence of the DS2 auto-scaling controller under variable workloads. When compared with the state-of-the-art ODRP placement strategy, CAPSys computes the task placement in orders of magnitude lower time and achieves up to 6× higher throughput.
Description
License
CC0 1.0 Universal