Tuning LSM trees using bayesian optimization
OA Version
Citation
Abstract
With the exponential growth of data generation, optimizing databases and its underlying storage structure has emerged as an area of extensive and critical research. This thesis addresses an important aspect of this challenge by introducing an innovative approach to optimize Log Structured Merge Trees (LSM Trees), a state-of-the-art storage structure primarily created for write-heavy database applications without compromising on read operations. It uses Bayesian optimization via the BoTorch library to fine-tune the LSM tree configurations to balance across different workload configurations and address the longstanding challenge of dynamic workload adaptability. A pivotal aspect of this approach is the adaptation of Bayesian optimization to explore the LSM Tree parameter space intelligently by separately handling categorical and continuous variables and enabling a better, more complex examination of the cost surface. This is done by comprehensively analyzing the LSM Tree structure, its amplification issues, and understanding the overall operational mechanics of this storage structure. The proposed solution is implemented not only on the classic LSM Tree module, but also on hybrid LSM Tree structures and their compaction strategies. The proposed solution approaches this problem by combining the BoTorch framework with an established analytical cost model for evaluation that serves as the objective function for the optimization process. This approach addresses a notable limitation of using the closed-form cost function to predict design decisions which solve a Linear Program instead of a Linear Integer Program and treats all values as continuous parameters, which does not accurately reflect the discrete nature of certain design decisions. Experimental validation on diverse workloads demonstrate the efficiency of the proposed approach and show significant performance gains over traditional tuning methods. This thesis contributes to the growing research on database optimization strategies and help database administrators tune the performance of the LSM Tree structure with minimal manual intervention by providing an incremental step towards self-tuning database management systems, where tuning and optimization can be automated and help in paving the way for better, more reliable storage solutions.
Description
2024
License
Attribution-NonCommercial-ShareAlike 4.0 International