Automated data function extraction from textual requirements by leveraging semi-supervised CRF and language model

Function Point Analysis (FPA) provides an objective, comparative measure for size estimation in the early stage of software development. When practicing FPA, analysts typically abide by the following steps: data function (DF) extraction, transactional function extraction, function type classificatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and software technology 2022-03, Vol.143, p.106770, Article 106770
Hauptverfasser: Li, Mingyang, Shi, Lin, Wang, Yawen, Wang, Junjie, Wang, Qing, Hu, Jun, Peng, Xinhua, Liao, Weimin, Pi, Guizhen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Function Point Analysis (FPA) provides an objective, comparative measure for size estimation in the early stage of software development. When practicing FPA, analysts typically abide by the following steps: data function (DF) extraction, transactional function extraction, function type classification and adjustment factor determination. However, due to lack of approach and tool support, these steps are usually conduct by human efforts in practice. Related approaches can hardly be applied in the FPA due to the following three challenges, i.e., FPA rule-driven extraction, domain-specific parsing, and expensive labeled resources. In this paper, we aim to automate the extraction of DFs, which is the starting and fundamental step in FPA. We propose an automated approach named DEX to extract data functions from textual requirements. Specifically, DEX introduces the popularly-used conditional random field (CRF) model to predict the boundary of a data function. Besides, DEX employs the bootstrapping-based algorithm and DF-oriented language model to further boost the performance. We evaluate DEX from two aspects: evaluation on a real industrial dataset and a manual review by domain experts. The evaluation on the real industrial dataset shows that DEX could achieve 80% precision, 84% recall, and 82% F1, and outperforms three state-of-the-art baselines. The expert review suggests that DEX could increase 16% precision and 13% recall, compared with those produced by engineers. DEX could achieve promising results under a small number of labeled requirements and outperform the state-of-the-art approaches. Moreover, DEX could help engineers produce more accurate and complete DFs in the industrial environment. •Function point analysis.•Size estimation.•Natural language processing.
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2021.106770