Comparison of text feature selection policies and using an adaptive framework
► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2013-09, Vol.40 (12), p.4871-4886 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4886 |
---|---|
container_issue | 12 |
container_start_page | 4871 |
container_title | Expert systems with applications |
container_volume | 40 |
creator | Tasci, S Guengoer, T |
description | ► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared.
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes. |
doi_str_mv | 10.1016/j.eswa.2013.02.019 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1701124612</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417413001358</els_id><sourcerecordid>1530986761</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</originalsourceid><addsrcrecordid>eNqNkcFO3DAQhi1UJLa0L9BTLki9JMzYcRxLXNCqFCQqLuVsGWdceZuNg52F8vb1ahHH0ottyZ9nxv_H2BeEBgG7801D-dk2HFA0wBtAfcRW2CtRd0qLD2wFWqq6RdWesI85bwBQAagV-7GO29mmkONURV8t9GepPNlll6jKNJJbQrmZ4xhcoFzZaah2OUy_yqmyg52X8ESVT3ZLzzH9_sSOvR0zfX7dT9n91bef6-v69u77zfrytnZC86X2oFA61TvSVgjfI4LwwLV23A0cOy9AcuIPctBS9g_eat76QUpHjlDrTpyyr4e6c4qPO8qL2YbsaBztRHGXTfkcIm875P-FQitKYu-jUoDuO9Xh-2jb9qosaj8AP6AuxZwTeTOnsLXpxSCYvTyzMXt5Zi_PADeHUc5e69vs7FgCnlzIby-5Eq3m2Bfu4sBRSfspUDK5eJocDSEVd2aI4V9t_gKVb645</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1448714472</pqid></control><display><type>article</type><title>Comparison of text feature selection policies and using an adaptive framework</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Tasci, S ; Guengoer, T</creator><creatorcontrib>Tasci, S ; Guengoer, T</creatorcontrib><description>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared.
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2013.02.019</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Adaptive keyword selection ; Algorithms ; Applied sciences ; Artificial intelligence ; Categories ; Computer science; control theory; systems ; Data processing. List processing. Character string processing ; Document categorization ; Exact sciences and technology ; Expert systems ; Feature selection ; Local and global policies ; Memory organisation. Data processing ; Policies ; Software ; Speech and sound recognition and synthesis. Linguistics ; Support vector machines ; Tasks ; Texts ; Weighting</subject><ispartof>Expert systems with applications, 2013-09, Vol.40 (12), p.4871-4886</ispartof><rights>2013 Elsevier Ltd</rights><rights>2014 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</citedby><cites>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2013.02.019$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27349218$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Tasci, S</creatorcontrib><creatorcontrib>Guengoer, T</creatorcontrib><title>Comparison of text feature selection policies and using an adaptive framework</title><title>Expert systems with applications</title><description>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared.
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</description><subject>Adaptive keyword selection</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Categories</subject><subject>Computer science; control theory; systems</subject><subject>Data processing. List processing. Character string processing</subject><subject>Document categorization</subject><subject>Exact sciences and technology</subject><subject>Expert systems</subject><subject>Feature selection</subject><subject>Local and global policies</subject><subject>Memory organisation. Data processing</subject><subject>Policies</subject><subject>Software</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Support vector machines</subject><subject>Tasks</subject><subject>Texts</subject><subject>Weighting</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkcFO3DAQhi1UJLa0L9BTLki9JMzYcRxLXNCqFCQqLuVsGWdceZuNg52F8vb1ahHH0ottyZ9nxv_H2BeEBgG7801D-dk2HFA0wBtAfcRW2CtRd0qLD2wFWqq6RdWesI85bwBQAagV-7GO29mmkONURV8t9GepPNlll6jKNJJbQrmZ4xhcoFzZaah2OUy_yqmyg52X8ESVT3ZLzzH9_sSOvR0zfX7dT9n91bef6-v69u77zfrytnZC86X2oFA61TvSVgjfI4LwwLV23A0cOy9AcuIPctBS9g_eat76QUpHjlDrTpyyr4e6c4qPO8qL2YbsaBztRHGXTfkcIm875P-FQitKYu-jUoDuO9Xh-2jb9qosaj8AP6AuxZwTeTOnsLXpxSCYvTyzMXt5Zi_PADeHUc5e69vs7FgCnlzIby-5Eq3m2Bfu4sBRSfspUDK5eJocDSEVd2aI4V9t_gKVb645</recordid><startdate>20130915</startdate><enddate>20130915</enddate><creator>Tasci, S</creator><creator>Guengoer, T</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130915</creationdate><title>Comparison of text feature selection policies and using an adaptive framework</title><author>Tasci, S ; Guengoer, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Adaptive keyword selection</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Categories</topic><topic>Computer science; control theory; systems</topic><topic>Data processing. List processing. Character string processing</topic><topic>Document categorization</topic><topic>Exact sciences and technology</topic><topic>Expert systems</topic><topic>Feature selection</topic><topic>Local and global policies</topic><topic>Memory organisation. Data processing</topic><topic>Policies</topic><topic>Software</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Support vector machines</topic><topic>Tasks</topic><topic>Texts</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tasci, S</creatorcontrib><creatorcontrib>Guengoer, T</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tasci, S</au><au>Guengoer, T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of text feature selection policies and using an adaptive framework</atitle><jtitle>Expert systems with applications</jtitle><date>2013-09-15</date><risdate>2013</risdate><volume>40</volume><issue>12</issue><spage>4871</spage><epage>4886</epage><pages>4871-4886</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared.
Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2013.02.019</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2013-09, Vol.40 (12), p.4871-4886 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_miscellaneous_1701124612 |
source | Elsevier ScienceDirect Journals Complete |
subjects | Adaptive keyword selection Algorithms Applied sciences Artificial intelligence Categories Computer science control theory systems Data processing. List processing. Character string processing Document categorization Exact sciences and technology Expert systems Feature selection Local and global policies Memory organisation. Data processing Policies Software Speech and sound recognition and synthesis. Linguistics Support vector machines Tasks Texts Weighting |
title | Comparison of text feature selection policies and using an adaptive framework |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T08%3A12%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20text%20feature%20selection%20policies%20and%20using%20an%20adaptive%20framework&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Tasci,%20S&rft.date=2013-09-15&rft.volume=40&rft.issue=12&rft.spage=4871&rft.epage=4886&rft.pages=4871-4886&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2013.02.019&rft_dat=%3Cproquest_cross%3E1530986761%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1448714472&rft_id=info:pmid/&rft_els_id=S0957417413001358&rfr_iscdi=true |