CAESAR: coherence-aided elective and seamless alternative routing via on-chip FPGA

Files
CAESAR_RTSS22.pdf(767.97 KB)
Accepted manuscript
Date
2022-12
Authors
Roozkhosh, Shahin
Hoornaert, Denis
Mancuso, Renato
Version
Accepted manuscript
OA Version
Citation
S. Roozkhosh, D. Hoornaert, R. Mancuso. 2022. "CAESAR: Coherence-Aided Elective and Seamless Alternative Routing via on-chip FPGA" Proceedings - Real-Time Systems Symposium, pp.356-369. https://doi.org/10.1109/rtss55097.2022.00038
Abstract
Prompted by the ever-growing demand for high-performance System-on-Chip (SoC) and the plateauing of CPU frequencies, the SoC design landscape is shifting. In a quest to offer programmable specialization, the adoption of tightly-coupled FPGAs co-located with traditional compute clusters has been embraced by major vendors. This CPU+FPGA architectural paradigm opens the door to novel hardware/software co-design opportunities. The key principle is that CPU-originated memory traffic can be re-routed through the FPGA for analysis and management purposes. Albeit promising, the side-effect of this approach is that time-critical operations—such as cache-line refills—are fulfilled by moving data over slower interconnects meant for I/O traffic. In this article, we introduce a novel principle named Cache Coherence Backstabbing to precisely tackle these shortcomings. The technique leverages the ability to include the FGPA in the same coherence domain as the core processing elements. Importantly, this enables Coherence-Aided Elective and Seamless Alternative Routing (CAESAR), i.e., seamless inspection and routing of memory transactions, especially cache-line refills, through the FPGA. CAESAR allows the definition of new memory programming paradigms. We discuss the intrinsic potentials of the approach and evaluate it with a full-stack prototype implementation on a commercial platform. Our experiments show an improvement of up to 29% in read bandwidth, 23% in latency, and 13% in pragmatic workloads over the state of the art. Furthermore, we showcase the first in-coherence-domain run-time profiler design as a use-case of the CAESAR approach.
Description
License