Bibliography | Hsu, Ya-Jen: Learning to Identify Equivalent Code. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 44 (2021). 43 pages, english.
|
Abstract | Neural software analysis trains a deep learning model on a given corpus of code aiming to address the task of program analysis. These kinds of analysis problems that are suitable for neural software analysis, such as bug detection, type prediction, or code summarization, typically start from a given corpus of code, transform code examples into vector representations, to the training and validation process of a neural model. In this work, we build a neural model resolving the problem of equivalent code prediction. Specifically, the model compare a given Java code method with a bytecode method, which form a method pair, and then predict whether the two are equivalent. We use pre-trained embedding models to represent both methods as sequences of tokens carrying semantic information in the vector representation for each tokens. The training and validation data for our model is composed of positive (correct pair of methods) and negative examples (incorrect pair of methods). Straightforwardly, positive examples contain a Java method and its compiled bytecode. To create negative examples, we propose a feature-based approach to select a bytecode method that is similar to a given Java method, yet still have di↵erent semantics. The result shows that our model have an accuracy of 86.3%. We also provide preliminary evidence suggesting the e↵ectiveness of the feature-based approach. Last but not least, we present the idea to consider our equivalence model as a pseudo-task providing the vector representation of code that can not only address our equivalence prediction task, but can also be further used for other software analysis problems.
|