Benchmarking learned and LSM indexes for data sortedness

Raman, Aneesh; Huynh, Andy; Lu, Jinqi; Athanassoulis, Manos

Benchmarking learned and LSM indexes for data sortedness

Files

3662165.3662764.pdf(1.32 MB)

Published version

Date

2024-06-09

DOI

10.1145/3662165.3662764

Authors

Raman, Aneesh

Huynh, Andy

Lu, Jinqi

Athanassoulis, Manos

URI

https://hdl.handle.net/2144/49417

Citation

Aneesh Raman, Andy Huynh, Jinqi Lu, and Manos Athanassoulis. 2024. Benchmarking Learned and LSM Indexes for Data Sortedness. In Proceedings of the Tenth International Workshop on Testing Database Systems (DBTest '24). Association for Computing Machinery, New York, NY, USA, 16–22. https://doi.org/10.1145/3662165.3662764

Abstract

Database systems use indexes on frequently accessed attributes to accelerate query and transaction processing. This requires paying the cost of maintaining and updating those indexes, which can be thought of as the process of adding structure (e.g., sort) to an otherwise unstructured data collection. The indexing cost is redundant when data arrives pre-sorted, even if only up to some degree. While recent work has studied how classical indexes like B+-trees cannot fully exploit the near-sortedness during ingestion, there is a lack of this exploration on other index designs like read-optimized learned indexes or write-optimized LSM-trees. In this paper, we bridge this gap by conducting the first-ever study on the behavior of learned indexes and LSM-trees when varying the data sortedness in an ingestion workload. Specifically, we build on prior work on benchmarking data sortedness on B+-trees and we expand the scope to benchmark: (i) ALEX and LIPP, which are updatable learned index designs; and (ii) the LSM-tree engine offered by RocksDB. We present our evaluation framework and detail key insights on the performance of the index designs when varying data sortedness. Our observations indicate that learned indexes exhibit unpredictable performance when ingesting differently sorted data, while LSM-trees can benefit from sortedness-aware optimizations. We highlight the potential headroom for improvement and lay the groundwork for further research in this domain.

License

© 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. This article has been published under a Read & Publish Transformative Open Access (OA) Agreement with ACM.

cb

Collections

BU Open Access Articles

Full item page