Application of seq2seq models on code correction

Chin, Sang; Huang, Shan; Zhou, Xiao

Application of seq2seq models on code correction

Files

2001.11367-2.pdf(2.35 MB)

First author draft

Date

2020

Authors

Chin, Sang

Huang, Shan

Zhou, Xiao

Version

First author draft

URI

https://hdl.handle.net/2144/43314

Citation

S. Chin, S. Huang, X. Zhou. "Application of Seq2Seq Models on Code Correction." https://arxiv.org/abs/2001.11367.

Abstract

We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets (SARD), and achieve 75%(for C/C++) and 56%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets.

Collections

BU Open Access Articles
CAS: Computer Science: Scholarly Papers

Full item page