Operationalizing validity of empirical software engineering studies
Empirical Software Engineering studies apply methods, like linear regression, statistic tests, or correlation analysis, to better understand software engineering scenarios. Assuring the validity of such methods and corresponding results is challenging but critical. This is also reflected by quality...
Gespeichert in:
Veröffentlicht in: | Empirical software engineering : an international journal 2023-11, Vol.28 (6), p.153, Article 153 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Empirical Software Engineering studies apply methods, like linear regression, statistic tests, or correlation analysis, to better understand software engineering scenarios. Assuring the validity of such methods and corresponding results is challenging but critical. This is also reflected by quality criteria on the validity that are part of the reviewing process for the corresponding research results. However, such criteria are often hard to define operationally and thus hard to judge by the reviewers. In this paper, we describe a new strategy to define and communicate the validity of methods and results. We conceptually decompose a study into an empirical scenario, a used method, and the produced results. Validity can only be described as the relationship between the three parts. To make the empirical scenario fully operational, we convert informal assumptions on it into executable simulation code that leverages artificial data to replace (or complement) our real data. We can then run the method on the artificial data and examine the impact of our assumptions on the quality of results. This may operationally i) support the validity of a method for a valid result, ii) threaten the validity of a method for an invalid result if assumptions are controversial, or iii) invalidate a method for an invalid result if assumptions are plausible. We encourage researchers to submit simulations as additional artifacts to the reviewing process to make such statements explicit. Rating if a simulated scenario is plausible or controversial is subjective and may benefit from involving a reviewer. We show that existing empirical software engineering studies can benefit from such additional validation artifacts. |
---|---|
ISSN: | 1382-3256 1573-7616 |
DOI: | 10.1007/s10664-023-10370-3 |