Optimizing the optimizer increasing performance efficiency of modern compilers

OA Version
Citation
Abstract
A long-standing goal, which is increasingly important in the post-Moore era, is to augment system performance by building more intelligent compilers. One of our motivating hypotheses is that much of the capability needed to advance compiler optimization is already present: state-of-the-art compilers not only provide a large set of code transformations, but also (by-and-large) correctly apply them to preserve the semantics, syntax, and functionality of the code. The challenge lies in getting the compiler to select an appropriate sequence and number of these transformations so as to generate the highest possible performance code based on the developer goals, such as size, speed, or energy. In this thesis, novel approaches towards automatically generating performant code are developed. In particular we showcase the use of deep learning in building a next generation "smart" compilation pipeline. Deep Learning can fathom complex relationships between code and compilation heuristics, hence providing an excellent tool for optimization tasks. First, we use it to predict a minimum set of optimization options that increase performance on a per-application basis. This is especially useful for frequently used applications and kernels. For these, developers are willing to spend hours to obtain a few percent performance improvement. Manually developing such heuristics is nearly impossible given the complexity and vastness of the optimization space. This led to two research thrusts. The first is optimizing how transformations and heuristics within a CPU compiler, such as GCC and LLVM, are applied. We note that doing so has the potential to benefit not only code targeting CPUs, but also code which targets hardware, e.g., FPGAs. This is because of the proliferation of High Level Synthesis (HLS), i.e., the use of languages, tools, and techniques that facilitate conversion of a CPU program into a custom hardware design. An efficient and performant CPU code and its underlying intermediate representation is likely to be well suited for translation to a Hardware Description Language (HDL) that programs an FPGA. The second research thrust is automating the pre-processing of the application code, e.g., through the application of directives or pragmas targeting different compilers (GCC, Vitis HLS, Intel HLS) and architectures (CPU and FPGA). A limitation of above approaches, per-application tuning of compiler heuristics, is that although they guarantee performance improvement, they are time-consuming and lack generality. We therefore use deep learning to find a generalized solution that outperforms the present solutions. First, we develop a neural net based cost function that can accurately predict binary code size for GCC-based compilation. This provides a means to circumvent the cost-computation bottleneck of invoking a downstream compiler to get performance values. This cost function is especially valuable in training models that require a reward in terms of the impact of transformations applied and thus invoke the compiler. Second, we develop pre-trained deep learning models that surpass GCC's default -Oz in more accurately predicting optimal compiler transformations for a given application. Our approach is sufficiently practical to be integrated into the compiler as an -OmL option.
Description
2025
License
Attribution 4.0 International