A comparative study of filter-based feature ranking techniques

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Huanjing Wang, Khoshgoftaar, T M, Kehan Gao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 48
container_issue
container_start_page 43
container_title
container_volume
creator Huanjing Wang
Khoshgoftaar, T M
Kehan Gao
description One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.
doi_str_mv 10.1109/IRI.2010.5558966
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5558966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5558966</ieee_id><sourcerecordid>5558966</sourcerecordid><originalsourceid>FETCH-LOGICAL-c137t-686b960a1b4058e8b307e221bcb5ef4a0b73165f120c003fd8908d3550b0cfe43</originalsourceid><addsrcrecordid>eNpFj81KAzEYRSMiqLV7wU1eYOqX_8xGKMWfgYIgui5J5otG22lNMkLf3oIFz-ZyNhcOIdcMZoxBe9u9dDMOB1NK2VbrE3LJJJfSQmvZ6b8YcU6mpXzCAam4NvaC3M1p2G52LruafpCWOvZ7uo00pnXF3HhXsKcRXR0z0uyGrzS804rhY0jfI5YrchbduuD0uBPy9nD_unhqls-P3WK-bAITpjbaat9qcMxLUBatF2CQc-aDVxilA28E0yoyDgFAxN62YHuhFHgIEaWYkJu_34SIq11OG5f3q2Ov-AUYtEh7</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A comparative study of filter-based feature ranking techniques</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Huanjing Wang ; Khoshgoftaar, T M ; Kehan Gao</creator><creatorcontrib>Huanjing Wang ; Khoshgoftaar, T M ; Kehan Gao</creatorcontrib><description>One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.</description><identifier>ISBN: 1424480973</identifier><identifier>ISBN: 9781424480975</identifier><identifier>EISBN: 1424480981</identifier><identifier>EISBN: 142448099X</identifier><identifier>EISBN: 9781424480999</identifier><identifier>EISBN: 9781424480982</identifier><identifier>DOI: 10.1109/IRI.2010.5558966</identifier><language>eng</language><publisher>IEEE</publisher><subject>Analysis of variance ; Measurement ; Niobium ; Radio frequency ; Software ; Software algorithms ; Support vector machines</subject><ispartof>2010 IEEE International Conference on Information Reuse &amp; Integration, 2010, p.43-48</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c137t-686b960a1b4058e8b307e221bcb5ef4a0b73165f120c003fd8908d3550b0cfe43</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5558966$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5558966$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huanjing Wang</creatorcontrib><creatorcontrib>Khoshgoftaar, T M</creatorcontrib><creatorcontrib>Kehan Gao</creatorcontrib><title>A comparative study of filter-based feature ranking techniques</title><title>2010 IEEE International Conference on Information Reuse &amp; Integration</title><addtitle>IRI</addtitle><description>One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.</description><subject>Analysis of variance</subject><subject>Measurement</subject><subject>Niobium</subject><subject>Radio frequency</subject><subject>Software</subject><subject>Software algorithms</subject><subject>Support vector machines</subject><isbn>1424480973</isbn><isbn>9781424480975</isbn><isbn>1424480981</isbn><isbn>142448099X</isbn><isbn>9781424480999</isbn><isbn>9781424480982</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFj81KAzEYRSMiqLV7wU1eYOqX_8xGKMWfgYIgui5J5otG22lNMkLf3oIFz-ZyNhcOIdcMZoxBe9u9dDMOB1NK2VbrE3LJJJfSQmvZ6b8YcU6mpXzCAam4NvaC3M1p2G52LruafpCWOvZ7uo00pnXF3HhXsKcRXR0z0uyGrzS804rhY0jfI5YrchbduuD0uBPy9nD_unhqls-P3WK-bAITpjbaat9qcMxLUBatF2CQc-aDVxilA28E0yoyDgFAxN62YHuhFHgIEaWYkJu_34SIq11OG5f3q2Ov-AUYtEh7</recordid><startdate>201008</startdate><enddate>201008</enddate><creator>Huanjing Wang</creator><creator>Khoshgoftaar, T M</creator><creator>Kehan Gao</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201008</creationdate><title>A comparative study of filter-based feature ranking techniques</title><author>Huanjing Wang ; Khoshgoftaar, T M ; Kehan Gao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c137t-686b960a1b4058e8b307e221bcb5ef4a0b73165f120c003fd8908d3550b0cfe43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Analysis of variance</topic><topic>Measurement</topic><topic>Niobium</topic><topic>Radio frequency</topic><topic>Software</topic><topic>Software algorithms</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Huanjing Wang</creatorcontrib><creatorcontrib>Khoshgoftaar, T M</creatorcontrib><creatorcontrib>Kehan Gao</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huanjing Wang</au><au>Khoshgoftaar, T M</au><au>Kehan Gao</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A comparative study of filter-based feature ranking techniques</atitle><btitle>2010 IEEE International Conference on Information Reuse &amp; Integration</btitle><stitle>IRI</stitle><date>2010-08</date><risdate>2010</risdate><spage>43</spage><epage>48</epage><pages>43-48</pages><isbn>1424480973</isbn><isbn>9781424480975</isbn><eisbn>1424480981</eisbn><eisbn>142448099X</eisbn><eisbn>9781424480999</eisbn><eisbn>9781424480982</eisbn><abstract>One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.</abstract><pub>IEEE</pub><doi>10.1109/IRI.2010.5558966</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1424480973
ispartof 2010 IEEE International Conference on Information Reuse & Integration, 2010, p.43-48
issn
language eng
recordid cdi_ieee_primary_5558966
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Analysis of variance
Measurement
Niobium
Radio frequency
Software
Software algorithms
Support vector machines
title A comparative study of filter-based feature ranking techniques
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T20%3A38%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20comparative%20study%20of%20filter-based%20feature%20ranking%20techniques&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Information%20Reuse%20&%20Integration&rft.au=Huanjing%20Wang&rft.date=2010-08&rft.spage=43&rft.epage=48&rft.pages=43-48&rft.isbn=1424480973&rft.isbn_list=9781424480975&rft_id=info:doi/10.1109/IRI.2010.5558966&rft_dat=%3Cieee_6IE%3E5558966%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424480981&rft.eisbn_list=142448099X&rft.eisbn_list=9781424480999&rft.eisbn_list=9781424480982&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5558966&rfr_iscdi=true