Hardware transactional memory exploration in coherence-free many-core architectures

Date
2018-12-01
Authors
Papagiannopoulou, Dimitra
Marongiu, Andrea
Moreshet, Tali
Benini, Luca
Herlihy, Maurice
Bahar, R. Iris
Version
Accepted manuscript
OA Version
Citation
Dimitra Papagiannopoulou, Andrea Marongiu, Tali Moreshet, Luca Benini, Maurice Herlihy, R Iris Bahar. 2018. "Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures." INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, Volume 46, Issue 6, pp. 1304 - 1328 (25). https://doi.org/10.1007/s10766-018-0569-7
Abstract
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These “coherence-free” systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes.
Description
License
This is a post-peer-review, pre-copyedit version of an article published in INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING. The final authenticated version is available online at: https://doi.org/10.1007/s10766-018-0569-7.