Program synthesis and vulnerability injection using a Grammar VAE
Kosta, Leonard Raymond
MetadataShow full item record
The ability to automatically detect and repair vulnerabilities in code before deployment has become the subject of increasing attention. Some approaches to this problem rely on machine learning techniques, however the lack of datasets–code samples labeled as containing a vulnerability or not–presents a barrier to performance. We design and implement a deep neural network based on the recently developed Grammar Variational Autoencoder (VAE) architecture to generate an arbitrary number of unique C functions labeled in the aforementioned manner. We make several improvements on the original Grammar VAE: we guarantee that every vector in the neural network’s latent space decodes to a syntactically valid C function; we extend the Grammar VAE into a context-sensitive environment; and we implement a semantic repair algorithm that transforms syntactically valid C functions into fully semantically valid C functions that compile and execute. Users can control the semantic qualities of output functions with our constraint system. Our constraints allow users to modify the return type, change control flow structures, inject vulnerabilities into generated code, and more. We demonstrate the advantages of our model over other program synthesis models targeting similar applications. We also explore alternative applications for our model, including code plagiarism detection and compiler fuzzing, testing, and optimization.