Bibliography | Munoz Baron, Marvin: Validating the threats to validity in program comprehension experiments. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 23 (2022). 63 pages, english.
|
Abstract | BACKGROUND: Designing empirical software engineering methodologies presupposes the existence of evidence to base actions on empirical fact. Understanding what factors may influence experiment results, how to select appropriate samples and study subjects, and how to correctly apply statistical methods is a prerequisite for developing a methodology based on established research. This is especially true in a field such as program comprehension, where hundreds of different contextual factors can alter the validity of the obtained results. Here, researchers often have intuitions about what might threaten the validity of their studies but do not have the evidence to support their claims. OBJECTIVE: This study examines the threats to validity in program comprehension experiments to collect evidence of their existence, to understand the context and nature in which they occur, and to ultimately assist researchers in designing controlled experiments with high validity. METHODS: First, we conduct a systematic review surveying existing program comprehension experiments and summarizing what threats to validity they report. We then follow up on the three most commonly cited threats, performing small-scale systematic reviews and evaluating the collected evidence using an evidence profile to investigate their influence as a threat. RESULTS: We found that only 31 out of 409 (8 %) individual threat mentions were reported with supporting evidence. Furthermore, for the three most common threats, programming experience, program length and comprehension measures, we found that contextual factors such as how measurements are made, the individual characteristics of the population sample, and what concrete tasks are employed all change the way a threat impacts the results of a study. CONCLUSION: Threats to validity are highly context-dependent and as such must be controlled in different ways. Researchers should use existing evidence to inform their decision-making and explicitly address both why a threat poses a danger and how they controlled it in the context of their study. To this end, we need structured guidelines for reporting threats to validity and public knowledge bases that contain threats, evidence, and mitigation techniques for program comprehension experiments.
|