Benchmarking, analyzing, and optimizing partial compaction in LSM-trees
OA Version
Citation
Abstract
Log-structured merge (LSM) trees are widely adopted by many NoSQL databases because they offer high ingestion throughput via logging incoming data in a memory buffer, and flushing it to disk as a sorted run when the buffer fills up. To reduce space amplification and facilitate queries, LSM-trees periodically merge-sort data on storage to form larger but fewer sorted runs (this process is also termed as compaction). In commercial LSM-based systems, sorted runs are frequently broken down to a set of small files, allowing partial compaction (i.e., a single file is picked to compact), effectively reducing the worst-case compaction latency without increasing the amortized write cost. However, as multiple files may exist in a sorted run, deciding which file that should be compacted affects the overall write cost, and the optimal decision remains as an open research question. Specifically, while there are many state-of-the-art file selection policies (e.g., MinOverlappingRatio, RoundRobin), it still remains unclear that how far they are from the optimal. This work focuses on how to identifythe best file file to compact in an offline setting, and how different file picking policies perform for different workloads.
This thesis first designs an algorithm that enumerates all possible file selection with pruning to find the minimum number of written bytes for a given workload, and then benchmarks four state-of-the-art file picking policies, MinOverlappingRatio (MOR), RoundRobin (RR), OldestLargestSeqFirst (OLSF ), and OldestSmallestSeqFirst (OSSF ) with different ingestion workloads, and compare them with the optimal one to investigate the improvement headroom for each policy. In the experimental results, it is observed that the headroom varies from 1.5% to 14.8% for different types of workload, but no clear relationship between the headroom and the type of workload is revealed. Besides, compared to the minimum, SOTA policies are much less stable (i.e., the number of written bytes has a significant deviation when running the same type of workload for multiple times), so a further study is conducted on the instability of MOR, and finds that the dynamic size of SSTables is the main reason. In the experiment of different update distributions, consistent performance is discovered, in which the WAs of OLSF and OSSF keep being the first and second highest. Another observation is that compared with MOR, the performance of RR shifts from 4.39% to 7.60% as skewness increases. In the experiment of scalability, OSSF shows its outstanding ability under the workload of a large number of write operations, lowering the WA up to 2.27%.
Description
2024