A Study of Using Multimodal LLMs for Non-Crash Functional Bug Detection in Android Apps
Numerous approaches employing various strategies have been developed to test the graphical user interfaces (GUIs) of mobile apps. However, traditional GUI testing techniques, such as random and model-based testing, primarily focus on generating test sequences that excel in achieving high code covera...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Numerous approaches employing various strategies have been developed to test
the graphical user interfaces (GUIs) of mobile apps. However, traditional GUI
testing techniques, such as random and model-based testing, primarily focus on
generating test sequences that excel in achieving high code coverage but often
fail to act as effective test oracles for non-crash functional (NCF) bug
detection. To tackle these limitations, this study empirically investigates the
capability of leveraging large language models (LLMs) to be test oracles to
detect NCF bugs in Android apps. Our intuition is that the training corpora of
LLMs, encompassing extensive mobile app usage and bug report descriptions,
enable them with the domain knowledge relevant to NCF bug detection. We
conducted a comprehensive empirical study to explore the effectiveness of LLMs
as test oracles for detecting NCF bugs in Android apps on 71 well-documented
NCF bugs. The results demonstrated that LLMs achieve a 49% bug detection rate,
outperforming existing tools for detecting NCF bugs in Android apps.
Additionally, by leveraging LLMs to be test oracles, we successfully detected
24 previously unknown NCF bugs in 64 Android apps, with four of these bugs
being confirmed or fixed. However, we also identified limitations of LLMs,
primarily related to performance degradation, inherent randomness, and false
positives. Our study highlights the potential of leveraging LLMs as test
oracles for Android NCF bug detection and suggests directions for future
research. |
---|---|
DOI: | 10.48550/arxiv.2407.19053 |