Masterarbeit MSTR-2021-44

Bibliograph.
Daten
Hsu, Ya-Jen: Learning to Identify Equivalent Code.
Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik, Masterarbeit Nr. 44 (2021).
43 Seiten, englisch.
Kurzfassung

Neural software analysis trains a deep learning model on a given corpus of code aiming to address the task of program analysis. These kinds of analysis problems that are suitable for neural software analysis, such as bug detection, type prediction, or code summarization, typically start from a given corpus of code, transform code examples into vector representations, to the training and validation process of a neural model. In this work, we build a neural model resolving the problem of equivalent code prediction. Specifically, the model compare a given Java code method with a bytecode method, which form a method pair, and then predict whether the two are equivalent. We use pre-trained embedding models to represent both methods as sequences of tokens carrying semantic information in the vector representation for each tokens. The training and validation data for our model is composed of positive (correct pair of methods) and negative examples (incorrect pair of methods). Straightforwardly, positive examples contain a Java method and its compiled bytecode. To create negative examples, we propose a feature-based approach to select a bytecode method that is similar to a given Java method, yet still have diā†µerent semantics. The result shows that our model have an accuracy of 86.3%. We also provide preliminary evidence suggesting the eā†µectiveness of the feature-based approach. Last but not least, we present the idea to consider our equivalence model as a pseudo-task providing the vector representation of code that can not only address our equivalence prediction task, but can also be further used for other software analysis problems.

Abteilung(en)Universität Stuttgart, Institut für Softwaretechnologie, Software Lab - Programmanalysen
BetreuerPradel, Prof. Michael; Bulling, Prof. Andreas
Eingabedatum4. November 2021
   Publ. Informatik