Artificial datasets for hierarchical classification
•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made ava...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2021-11, Vol.182, p.115218, Article 115218 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community.
Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2021.115218 |