Hybrid embedding-based text representation for hierarchical multi-label text classification

•A hybrid embedding-based hierarchical multi-label text classification (HMTC).•A level-by-level local HMTC integrating Bi-GRU model with hybrid embedding.•Extensive experiments were made for illustrating the performance of our method. Many real-world text classification tasks often deal with a large...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2022-01, Vol.187, p.115905, Article 115905
Hauptverfasser: Ma, Yinglong, Liu, Xiaofeng, Zhao, Lijiao, Liang, Yue, Zhang, Peng, Jin, Beihong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A hybrid embedding-based hierarchical multi-label text classification (HMTC).•A level-by-level local HMTC integrating Bi-GRU model with hybrid embedding.•Extensive experiments were made for illustrating the performance of our method. Many real-world text classification tasks often deal with a large number of closely related categories organized in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) has become rather challenging when it requires handling large sets of closely related categories. The structural features of all categories in the entire hierarchy and the word semantics of their category labels are very helpful in improving text classification accuracy over large sets of closely related categories, which has been neglected in most of existing HMTC approaches. In this paper, we present a hybrid embedding-based text representation for HMTC with high accuracy. First, the hybrid embedding consists of both graph embedding of categories in the hierarchy and their word embedding of category labels. The Structural Deep Network Embedding-based graph embedding model is used to simultaneously encode the global and local structural features of a given category in the whole hierarchy for making the category structurally discriminable. We further use the word embedding technique to encode the word semantics of each category label in the hierarchy for making different categories semantically discriminable. Second, we presented a level-by-level HMTC approach based on the bidirectional Gated Recurrent Unit network model together with the hybrid embedding that is used to learn the representation of the text level-by-level. Last but not least, extensive experiments were made over five large-scale real-world datasets in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches, and the experimental results show that our approach is very competitive to the state-of-the-art approaches in classification accuracy, in particular maintaining computational costs while achieving superior performance.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.115905