Bachelor Thesis BCLR-2021-17

BibliographyBredl, Paul: Improving the Recall of Searching for Code Changes.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 17 (2021).
59 pages, english.
Abstract

DiffSearch is a search engine that allows finding specific types of code changes stored in code repositories. While there are already search engines for code fragments, it is challenging to search specifically for code changes. However, there are various use-cases for searching for code changes, such as finding bug fixing patterns. The architecture of DiffSearch ensures scalability and precision, but the recall leaves room for improvement. The tool finds only 57.2% of the relevant code changes for a query on average. Improving the recall is necessary to ensure reasonable search engine usage. DiffSearch extracts structural information from the syntax tree of code fragments and uses these extracted parts as features to create feature vectors. The tool uses these vectors to build an index that allows fast nearest-neighbor search to find relevant code changes for a query. We extend this approach by extracting more information from the code changes and their syntax tree and using a modified extraction of features for search queries. With our modifications to the approach, we increase the recall to 92.8% without reducing performance, scalability, or precision.

Department(s)University of Stuttgart, Institute of Software Technology, Software Lab - Program Analysis
Superviser(s)Pradel, Prof. Michael; Di Grazia, Luca
Entry dateJune 30, 2021
   Publ. Computer Science