Artificial datasets for hierarchical classification

•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made ava...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2021-11, Vol.182, p.115218, Article 115218
Hauptverfasser: Serrano-Pérez, Jonathan, Sucar, L. Enrique
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 115218
container_title Expert systems with applications
container_volume 182
creator Serrano-Pérez, Jonathan
Sucar, L. Enrique
description •A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community. Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.
doi_str_mv 10.1016/j.eswa.2021.115218
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2576366320</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421006515</els_id><sourcerecordid>2576366320</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPB866ZZDfZgpdStAoFL3oOs9kJzVK7NUkV_71Z1rPDwBzmfefjYewWeAkc1H1fUvzGUnABJUAtoDljM2i0LJReynM248taFxXo6pJdxdhzDppzPWNyFZJ33nrcLzpMGCnFhRvCYucpYLA7b3PH7jHGUYbJD4drduFwH-nmr87Z-9Pj2_q52L5uXtarbWGlaFIh7LKum07qnC5fBOBItbyqhBMtca6wdQQ5nG5bJ5UV6BDrpmkrlLIiOWd309xjGD5PFJPph1M45JVG1FpJpaTgWSUmlQ1DjIGcOQb_geHHADcjHNObEY4Z4ZgJTjY9TCbK93_lV020ng6WOh_IJtMN_j_7L1uHbN4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2576366320</pqid></control><display><type>article</type><title>Artificial datasets for hierarchical classification</title><source>Elsevier ScienceDirect Journals</source><creator>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</creator><creatorcontrib>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</creatorcontrib><description>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community. Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.115218</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Artificial datasets ; Bayesian analysis ; Classification ; Datasets ; Evaluation ; Hierarchical classification ; Hierarchies ; Labels ; Source code ; State-of-the-art reviews</subject><ispartof>Expert systems with applications, 2021-11, Vol.182, p.115218, Article 115218</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Nov 15, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</citedby><cites>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0957417421006515$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Serrano-Pérez, Jonathan</creatorcontrib><creatorcontrib>Sucar, L. Enrique</creatorcontrib><title>Artificial datasets for hierarchical classification</title><title>Expert systems with applications</title><description>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community. Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</description><subject>Artificial datasets</subject><subject>Bayesian analysis</subject><subject>Classification</subject><subject>Datasets</subject><subject>Evaluation</subject><subject>Hierarchical classification</subject><subject>Hierarchies</subject><subject>Labels</subject><subject>Source code</subject><subject>State-of-the-art reviews</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPB866ZZDfZgpdStAoFL3oOs9kJzVK7NUkV_71Z1rPDwBzmfefjYewWeAkc1H1fUvzGUnABJUAtoDljM2i0LJReynM248taFxXo6pJdxdhzDppzPWNyFZJ33nrcLzpMGCnFhRvCYucpYLA7b3PH7jHGUYbJD4drduFwH-nmr87Z-9Pj2_q52L5uXtarbWGlaFIh7LKum07qnC5fBOBItbyqhBMtca6wdQQ5nG5bJ5UV6BDrpmkrlLIiOWd309xjGD5PFJPph1M45JVG1FpJpaTgWSUmlQ1DjIGcOQb_geHHADcjHNObEY4Z4ZgJTjY9TCbK93_lV020ng6WOh_IJtMN_j_7L1uHbN4</recordid><startdate>20211115</startdate><enddate>20211115</enddate><creator>Serrano-Pérez, Jonathan</creator><creator>Sucar, L. Enrique</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20211115</creationdate><title>Artificial datasets for hierarchical classification</title><author>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial datasets</topic><topic>Bayesian analysis</topic><topic>Classification</topic><topic>Datasets</topic><topic>Evaluation</topic><topic>Hierarchical classification</topic><topic>Hierarchies</topic><topic>Labels</topic><topic>Source code</topic><topic>State-of-the-art reviews</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Serrano-Pérez, Jonathan</creatorcontrib><creatorcontrib>Sucar, L. Enrique</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Serrano-Pérez, Jonathan</au><au>Sucar, L. Enrique</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Artificial datasets for hierarchical classification</atitle><jtitle>Expert systems with applications</jtitle><date>2021-11-15</date><risdate>2021</risdate><volume>182</volume><spage>115218</spage><pages>115218-</pages><artnum>115218</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community. Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.115218</doi></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2021-11, Vol.182, p.115218, Article 115218
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2576366320
source Elsevier ScienceDirect Journals
subjects Artificial datasets
Bayesian analysis
Classification
Datasets
Evaluation
Hierarchical classification
Hierarchies
Labels
Source code
State-of-the-art reviews
title Artificial datasets for hierarchical classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T11%3A20%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Artificial%20datasets%20for%20hierarchical%20classification&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Serrano-P%C3%A9rez,%20Jonathan&rft.date=2021-11-15&rft.volume=182&rft.spage=115218&rft.pages=115218-&rft.artnum=115218&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.115218&rft_dat=%3Cproquest_cross%3E2576366320%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2576366320&rft_id=info:pmid/&rft_els_id=S0957417421006515&rfr_iscdi=true