Reshaping input spaces to fuzz complex targets

Date
2024
DOI
Authors
Bulekov, Alexander
Version
OA Version
Citation
Abstract
In recent years, fuzz-testing has appeared as the dominant technique for automatically finding security issues in software. “Off-the-shelf” fuzzers such as AFL(++) and libFuzzer (Fioraldi et al., 2020; Serebryany, 2015) have been successfully applied to fuzz a wide range of software. The OSS-Fuzz project alone fuzzes over a thousand open-source projects and has found 40,000+ bugs to date. However, most prolific fuzzers are designed to target applications with well-defined APIs for ingesting inputs, such as image parsers. Applying fuzzers to targets with unconstrained and semantically-complex input-spaces, such as operating-system kernels, hypervisors, and browsers has proved to be a difficult problem. Most fuzzer implementations targeting these targets rely on an intermediate “grammar” layer between the fuzzing engine and the target to produce meaningful inputs. While effective, writing grammars requires a significant amount of manual effort by an expert. The grammar approach faces scaling issues when faced with the enormous amount of new code added to complex software, on a daily basis. In this thesis, we introduce input-space reshaping as a solution to the problem of fuzzing systems with semantically-complex input-spaces. While complex systems often feature clear interface boundaries, they usually accept input data both by listening to input-requests, but also by reading data directly, across the interface-boundary. Reshaping uses this common design paradigm by hooking into both types of accesses to provide fuzzers with a precise view of the input-data accessed by a system, without the need for prior-knowledge of input-semantics. Leveraging reshaping, we found that making minor modifications to the way a target ingests inputs, and providing key feedback to the fuzzer throughout input-execution drastically increases the efficiency of fuzzing complex targets with off-the-shelf fuzzing methods, without intermediate grammars. Furthermore, we found that, in some cases, reshaping can be applied without any access to target source code, by leveraging inherent charactersitics of the target. To support these claims, we describe our three applications of reshaping: 1. Fuzzing open-source hypervisors, by making minor modifications to the hypervisor source-code 2. Fuzzing the Linux Kernel without effort-intensive system-call descriptions 3. Fuzzing arbitrary closed-source hypervisors, without any modifications to source-code We detail our implementation of reshaping for each of these targets, and describe the results of our fuzzing campaigns, when compared with other state-of-the-art approaches. We demonstrate that while reshaping has a low initial implementation cost for the security engineer (less than 2% lines of code required per interface), it still competes with fuzzers outfit with meticulously-crafted grammars.
Description
License
Attribution 4.0 International