Artificial datasets for hierarchical classification

•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made ava...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2021-11, Vol.182, p.115218, Article 115218
Hauptverfasser: Serrano-Pérez, Jonathan, Sucar, L. Enrique
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community. Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.115218