MSTIL: Multi-cue Shape-aware Transferable Imbalance Learning for effective graphic API recommendation

Application Programming Interface (API) recommendation based on graphs is a valuable task in the fields of data visualization and software engineering. However, this task was previously undefined until a recently published paper coining the task as Plot2API and utilizing a deep learning-based method...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of systems and software 2023-06, Vol.200, p.111650, Article 111650
Hauptverfasser:	Qin, Rong, Wang, Zeyu, Huang, Sheng, Huangfu, Luwen
Format:	Artikel
Sprache:	eng
Schlagworte:	API recommendation Data augmentation Data visualization Semantic similarity Transfer learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Application Programming Interface (API) recommendation based on graphs is a valuable task in the fields of data visualization and software engineering. However, this task was previously undefined until a recently published paper coining the task as Plot2API and utilizing a deep learning-based method named SPGNN. Compared to general image classification methods, this dedicated approach uses semantic parsing to exploit deep features and yields better performance. However, its performance declines sharply in unbalanced datasets, thus limiting its generalizability. To address this issue, we propose a method named Multi-cue Shape and software engineering-aware Transferable Imbalance Learning (MSTIL), consisting of three major components: Cross-Language Shape-Aware Plot Transfer Learning (CLSAPTL), Cross-Language API Semantic Similarity-based Data Augmentation (CLASSDA), and Imbalance Plot2API Learning (IPL). Motivated by the hierarchical classification of the graphs, CLSAPTL guides the model to learn the graphs’ class hierarchy and thereby enabling the model to learn more transferable visual features. Given that a graph can be associated with multiple APIs and motivated by the fact that many APIs that exert similar functions in different languages have semantically similar names, CLASSDA leverages the samples of APIs with semantically similar names to assist in feature learning. Finally, inspired by the essence of softmax cross entropy loss, IPL alleviates the imbalances between positive and negative samples during training. We conduct our experiments on two public datasets. Extensive experimental results shows that MSTIL improves the performance of classic CNNs along with the state-of-the-art method, demonstrating its effectiveness. Specifically, MSTIL has an average relative mAP improvement of 12.94% across the models on all datasets. •Plot2API model suffer from overfitting due to data imbalance which can be reflected in two aspects.•An optimization strategy in MSTIL is proposed to address the overfitting caused by data imbalance.•MSTIL consists of a pertraining method, a data augmentation scheme and a new loss function.•MSTIL has an average relative mAP improvement of 12.94% across the models on all datasets.
ISSN:	0164-1212 1873-1228
DOI:	10.1016/j.jss.2023.111650