Artificial datasets for hierarchical classification
•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made ava...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2021-11, Vol.182, p.115218, Article 115218 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 115218 |
container_title | Expert systems with applications |
container_volume | 182 |
creator | Serrano-Pérez, Jonathan Sucar, L. Enrique |
description | •A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community.
Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods. |
doi_str_mv | 10.1016/j.eswa.2021.115218 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2576366320</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417421006515</els_id><sourcerecordid>2576366320</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPB866ZZDfZgpdStAoFL3oOs9kJzVK7NUkV_71Z1rPDwBzmfefjYewWeAkc1H1fUvzGUnABJUAtoDljM2i0LJReynM248taFxXo6pJdxdhzDppzPWNyFZJ33nrcLzpMGCnFhRvCYucpYLA7b3PH7jHGUYbJD4drduFwH-nmr87Z-9Pj2_q52L5uXtarbWGlaFIh7LKum07qnC5fBOBItbyqhBMtca6wdQQ5nG5bJ5UV6BDrpmkrlLIiOWd309xjGD5PFJPph1M45JVG1FpJpaTgWSUmlQ1DjIGcOQb_geHHADcjHNObEY4Z4ZgJTjY9TCbK93_lV020ng6WOh_IJtMN_j_7L1uHbN4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2576366320</pqid></control><display><type>article</type><title>Artificial datasets for hierarchical classification</title><source>Elsevier ScienceDirect Journals</source><creator>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</creator><creatorcontrib>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</creatorcontrib><description>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community.
Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2021.115218</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Artificial datasets ; Bayesian analysis ; Classification ; Datasets ; Evaluation ; Hierarchical classification ; Hierarchies ; Labels ; Source code ; State-of-the-art reviews</subject><ispartof>Expert systems with applications, 2021-11, Vol.182, p.115218, Article 115218</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright Elsevier BV Nov 15, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</citedby><cites>FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0957417421006515$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Serrano-Pérez, Jonathan</creatorcontrib><creatorcontrib>Sucar, L. Enrique</creatorcontrib><title>Artificial datasets for hierarchical classification</title><title>Expert systems with applications</title><description>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community.
Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</description><subject>Artificial datasets</subject><subject>Bayesian analysis</subject><subject>Classification</subject><subject>Datasets</subject><subject>Evaluation</subject><subject>Hierarchical classification</subject><subject>Hierarchies</subject><subject>Labels</subject><subject>Source code</subject><subject>State-of-the-art reviews</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPB866ZZDfZgpdStAoFL3oOs9kJzVK7NUkV_71Z1rPDwBzmfefjYewWeAkc1H1fUvzGUnABJUAtoDljM2i0LJReynM248taFxXo6pJdxdhzDppzPWNyFZJ33nrcLzpMGCnFhRvCYucpYLA7b3PH7jHGUYbJD4drduFwH-nmr87Z-9Pj2_q52L5uXtarbWGlaFIh7LKum07qnC5fBOBItbyqhBMtca6wdQQ5nG5bJ5UV6BDrpmkrlLIiOWd309xjGD5PFJPph1M45JVG1FpJpaTgWSUmlQ1DjIGcOQb_geHHADcjHNObEY4Z4ZgJTjY9TCbK93_lV020ng6WOh_IJtMN_j_7L1uHbN4</recordid><startdate>20211115</startdate><enddate>20211115</enddate><creator>Serrano-Pérez, Jonathan</creator><creator>Sucar, L. Enrique</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20211115</creationdate><title>Artificial datasets for hierarchical classification</title><author>Serrano-Pérez, Jonathan ; Sucar, L. Enrique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-2c9558d37d37f21811fe6b0442f2be006abfe1111f7bbf36c2afaa588b4a334e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial datasets</topic><topic>Bayesian analysis</topic><topic>Classification</topic><topic>Datasets</topic><topic>Evaluation</topic><topic>Hierarchical classification</topic><topic>Hierarchies</topic><topic>Labels</topic><topic>Source code</topic><topic>State-of-the-art reviews</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Serrano-Pérez, Jonathan</creatorcontrib><creatorcontrib>Sucar, L. Enrique</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Serrano-Pérez, Jonathan</au><au>Sucar, L. Enrique</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Artificial datasets for hierarchical classification</atitle><jtitle>Expert systems with applications</jtitle><date>2021-11-15</date><risdate>2021</risdate><volume>182</volume><spage>115218</spage><pages>115218-</pages><artnum>115218</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A novel method for generating hierarchical artificial datasets is proposed.•Several hierarchical datasets of tree and directed acyclic graph type are generated.•Hierarchical classification methods are evaluated with the artificial datasets.•The datasets and the program to generate them are made available to the community.
Hierarchical classification (HC) is a especial type of multilabel classification, where an instance can be associated to multiple labels, but in HC the labels are arranged in a predefined structure, commonly a tree but in its general form a Directed Acyclic Graph (DAG). HC includes up to eight different problems, and when a method is proposed to solve one of them, the real world datasets for each problem is limited. Thus, a way to extend the evaluation of a method is to generate Artificial Datasets (ADs). ADs are useful to evaluate a method in different conditions that could not be present in the available datasets. Thus, in this work is proposed a method that is able to generate artificial datasets for up to four of the different hierarchical classification problems, which makes use of distributions to generate the instances. Furthermore, two groups of ADs were generated using the proposed method, Tree and DAG hierarchies, which are made available to the scientific community; also the source code is provided so that you can generate your own datasets. Finally, standard and state of the art methods were evaluated with the generated artificial datasets. The best performance in the datasets was obtained by a couple of methods of the state of the art which make use of Bayesian networks and chained classifiers. The proposed method for generating HC datasets provides a flexible and general alternative to evaluate different hierarchical classification methods.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2021.115218</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2021-11, Vol.182, p.115218, Article 115218 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_journals_2576366320 |
source | Elsevier ScienceDirect Journals |
subjects | Artificial datasets Bayesian analysis Classification Datasets Evaluation Hierarchical classification Hierarchies Labels Source code State-of-the-art reviews |
title | Artificial datasets for hierarchical classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T11%3A20%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Artificial%20datasets%20for%20hierarchical%20classification&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Serrano-P%C3%A9rez,%20Jonathan&rft.date=2021-11-15&rft.volume=182&rft.spage=115218&rft.pages=115218-&rft.artnum=115218&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2021.115218&rft_dat=%3Cproquest_cross%3E2576366320%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2576366320&rft_id=info:pmid/&rft_els_id=S0957417421006515&rfr_iscdi=true |