Shedding Light on Software Engineering-specific Metaphors and Idioms
Use of figurative language, such as metaphors and idioms, is common in our daily-life communications, and it can also be found in Software Engineering (SE) channels, such as comments on GitHub. Automatically interpreting figurative language is a challenging task, even with modern Large Language Mode...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Use of figurative language, such as metaphors and idioms, is common in our
daily-life communications, and it can also be found in Software Engineering
(SE) channels, such as comments on GitHub. Automatically interpreting
figurative language is a challenging task, even with modern Large Language
Models (LLMs), as it often involves subtle nuances. This is particularly true
in the SE domain, where figurative language is frequently used to convey
technical concepts, often bearing developer affect (e.g., `spaghetti code').
Surprisingly, there is a lack of studies on how figurative language in SE
communications impacts the performance of automatic tools that focus on
understanding developer communications, e.g., bug prioritization, incivility
detection. Furthermore, it is an open question to what extent state-of-the-art
LLMs interpret figurative expressions in domain-specific communication such as
software engineering. To address this gap, we study the prevalence and impact
of figurative language in SE communication channels. This study contributes to
understanding the role of figurative language in SE, the potential of LLMs in
interpreting them, and its impact on automated SE communication analysis. Our
results demonstrate the effectiveness of fine-tuning LLMs with figurative
language in SE and its potential impact on automated tasks that involve affect.
We found that, among three state-of-the-art LLMs, the best improved fine-tuned
versions have an average improvement of 6.66% on a GitHub emotion
classification dataset, 7.07% on a GitHub incivility classification dataset,
and 3.71% on a Bugzilla bug report prioritization dataset. |
---|---|
DOI: | 10.48550/arxiv.2312.10297 |