A partition-based problem transformation algorithm for classifying imbalanced multi-label data

Multi-label learning has garnered much research interest due to its wide range of real-world applications. Many multi-label learning methods have been proposed; however, few have addressed the class imbalance problem existing in multi-label data. Even though some studies have taken this issue into a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-02, Vol.128, p.107506, Article 107506
Hauptverfasser: Duan, Jicong, Yang, Xibei, Gao, Shang, Yu, Hualong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-label learning has garnered much research interest due to its wide range of real-world applications. Many multi-label learning methods have been proposed; however, few have addressed the class imbalance problem existing in multi-label data. Even though some studies have taken this issue into account, most of them have ignored the label correlations or only considered random correlations between them. In this study, we propose a novel partition-based imbalanced multi-label learning algorithm, named Multi-label Learning based on Hierarchical Clustering (MLHC), to tackle this problem. MLHC first carries out hierarchical clustering on the original label space to divide it into several disconnected subspaces, each of which contains several labels that are strongly correlated with each other. Then, for each label subspace, we use the problem transformation strategy to convert it into a multi-class problem by binary coding. Any multi-class imbalance learning algorithm can be applied to the transformed multi-class data. Finally, the classification results will be decoded to retrieve the corresponding label subspace, and all label subspace results are combined to show the predicted label vector in the original label space. We conducted experiments not only on thirteen benchmark multi-label datasets but also carried out them on XJTU-SY which is a multi-label engineering application dataset, and the results indicated that our proposed MLHC learning algorithm outperforms several state-of-the-art class imbalance multi-label learning algorithms, demonstrating the effectiveness and necessity of discovering label correlations and transforming the original imbalanced multi-label learning problem into multiple strongly correlated multi-class imbalanced learning problems.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.107506