Bachelor Thesis BCLR-2022-09

BibliographySenger, Tobias: A unified open- and closed-source software requirements dataset.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Bachelor Thesis No. 9 (2022).
85 pages, english.

Requirements Engineering (RE) has proven to be an important factor for the success of a software project. The common use of natural language for writing requirements often results in problems that should be detected and avoided early. For this reason, we want to build automatic tools to support the process of specifying requirements using Deep Learning (DL). However, training robust DL models is very data-intensive and the RE community still suffers from a lack of large-scale requirement datasets that are easy to use. Therefore, the goal of this study is to create such a dataset that can be used for various tasks in the RE domain. To do this, we collect functional and non-functional requirements from a large number of both open and closed source software projects and combine them into a unified dataset using a simple data format. We then train a DL model for automatically classifying functional and non-functional requirements to show the potential of our dataset for training efficient DL models. We compare its performance with a state-of-the-art model and students at the University of Stuttgart. We also examine the differences between the open and closed source requirements in our dataset and compare the textual corpus of our dataset with common English datasets and corpora. Our studies showed that our model outperforms both the state-of-the-art model and most of the students. Further, we observed remarkable differences between the open and closed source requirements and found that our requirements use a unique vocabulary compared to common English texts.

Full text and
other links
Department(s)University of Stuttgart, Institute of Software Technology, Empirical Software Engineering
Superviser(s)Graziotin, Dr. Daniel; Habib, Mohammad Kasra
Entry dateMay 24, 2022
   Publ. Computer Science