Comparison of text feature selection policies and using an adaptive framework

► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2013-09, Vol.40 (12), p.4871-4886
Hauptverfasser:	Tasci, S, Guengoer, T
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive keyword selection Algorithms Applied sciences Artificial intelligence Categories Computer science control theory systems Data processing. List processing. Character string processing Document categorization Exact sciences and technology Expert systems Feature selection Local and global policies Memory organisation. Data processing Policies Software Speech and sound recognition and synthesis. Linguistics Support vector machines Tasks Texts Weighting
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4886
container_issue	12
container_start_page	4871
container_title	Expert systems with applications
container_volume	40
creator	Tasci, S Guengoer, T
description	► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.
doi_str_mv	10.1016/j.eswa.2013.02.019
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1701124612</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417413001358</els_id><sourcerecordid>1530986761</sourcerecordid><originalsourceid>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</originalsourceid><addsrcrecordid>eNqNkcFO3DAQhi1UJLa0L9BTLki9JMzYcRxLXNCqFCQqLuVsGWdceZuNg52F8vb1ahHH0ottyZ9nxv_H2BeEBgG7801D-dk2HFA0wBtAfcRW2CtRd0qLD2wFWqq6RdWesI85bwBQAagV-7GO29mmkONURV8t9GepPNlll6jKNJJbQrmZ4xhcoFzZaah2OUy_yqmyg52X8ESVT3ZLzzH9_sSOvR0zfX7dT9n91bef6-v69u77zfrytnZC86X2oFA61TvSVgjfI4LwwLV23A0cOy9AcuIPctBS9g_eat76QUpHjlDrTpyyr4e6c4qPO8qL2YbsaBztRHGXTfkcIm875P-FQitKYu-jUoDuO9Xh-2jb9qosaj8AP6AuxZwTeTOnsLXpxSCYvTyzMXt5Zi_PADeHUc5e69vs7FgCnlzIby-5Eq3m2Bfu4sBRSfspUDK5eJocDSEVd2aI4V9t_gKVb645</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1448714472</pqid></control><display><type>article</type><title>Comparison of text feature selection policies and using an adaptive framework</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Tasci, S ; Guengoer, T</creator><creatorcontrib>Tasci, S ; Guengoer, T</creatorcontrib><description>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2013.02.019</identifier><language>eng</language><publisher>Amsterdam: Elsevier Ltd</publisher><subject>Adaptive keyword selection ; Algorithms ; Applied sciences ; Artificial intelligence ; Categories ; Computer science; control theory; systems ; Data processing. List processing. Character string processing ; Document categorization ; Exact sciences and technology ; Expert systems ; Feature selection ; Local and global policies ; Memory organisation. Data processing ; Policies ; Software ; Speech and sound recognition and synthesis. Linguistics ; Support vector machines ; Tasks ; Texts ; Weighting</subject><ispartof>Expert systems with applications, 2013-09, Vol.40 (12), p.4871-4886</ispartof><rights>2013 Elsevier Ltd</rights><rights>2014 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</citedby><cites>FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2013.02.019$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27349218$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Tasci, S</creatorcontrib><creatorcontrib>Guengoer, T</creatorcontrib><title>Comparison of text feature selection policies and using an adaptive framework</title><title>Expert systems with applications</title><description>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</description><subject>Adaptive keyword selection</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Categories</subject><subject>Computer science; control theory; systems</subject><subject>Data processing. List processing. Character string processing</subject><subject>Document categorization</subject><subject>Exact sciences and technology</subject><subject>Expert systems</subject><subject>Feature selection</subject><subject>Local and global policies</subject><subject>Memory organisation. Data processing</subject><subject>Policies</subject><subject>Software</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Support vector machines</subject><subject>Tasks</subject><subject>Texts</subject><subject>Weighting</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqNkcFO3DAQhi1UJLa0L9BTLki9JMzYcRxLXNCqFCQqLuVsGWdceZuNg52F8vb1ahHH0ottyZ9nxv_H2BeEBgG7801D-dk2HFA0wBtAfcRW2CtRd0qLD2wFWqq6RdWesI85bwBQAagV-7GO29mmkONURV8t9GepPNlll6jKNJJbQrmZ4xhcoFzZaah2OUy_yqmyg52X8ESVT3ZLzzH9_sSOvR0zfX7dT9n91bef6-v69u77zfrytnZC86X2oFA61TvSVgjfI4LwwLV23A0cOy9AcuIPctBS9g_eat76QUpHjlDrTpyyr4e6c4qPO8qL2YbsaBztRHGXTfkcIm875P-FQitKYu-jUoDuO9Xh-2jb9qosaj8AP6AuxZwTeTOnsLXpxSCYvTyzMXt5Zi_PADeHUc5e69vs7FgCnlzIby-5Eq3m2Bfu4sBRSfspUDK5eJocDSEVd2aI4V9t_gKVb645</recordid><startdate>20130915</startdate><enddate>20130915</enddate><creator>Tasci, S</creator><creator>Guengoer, T</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130915</creationdate><title>Comparison of text feature selection policies and using an adaptive framework</title><author>Tasci, S ; Guengoer, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c392t-f0715c78ce9a33f81103f0299c2cd216f3052e2b5d9558bfa924fd55cece19963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Adaptive keyword selection</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Categories</topic><topic>Computer science; control theory; systems</topic><topic>Data processing. List processing. Character string processing</topic><topic>Document categorization</topic><topic>Exact sciences and technology</topic><topic>Expert systems</topic><topic>Feature selection</topic><topic>Local and global policies</topic><topic>Memory organisation. Data processing</topic><topic>Policies</topic><topic>Software</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Support vector machines</topic><topic>Tasks</topic><topic>Texts</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tasci, S</creatorcontrib><creatorcontrib>Guengoer, T</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tasci, S</au><au>Guengoer, T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of text feature selection policies and using an adaptive framework</atitle><jtitle>Expert systems with applications</jtitle><date>2013-09-15</date><risdate>2013</risdate><volume>40</volume><issue>12</issue><spage>4871</spage><epage>4886</epage><pages>4871-4886</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>► A comprehensive analysis of feature selection metrics is given. ► New feature selection metrics are introduced. ► Adaptive keyword selection method is proposed. ► Local and global feature selection performances are compared. Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, often feature selection is used for reducing the dimensionality. In this paper, we make an evaluation and comparison of the feature selection policies used in text categorization by employing some of the popular feature selection metrics. For the experiments, we use datasets which vary in size, complexity, and skewness. We use support vector machine as the classifier and tf-idf weighting for weighting the terms. In addition to the evaluation of the policies, we propose new feature selection metrics which show high success rates especially with low number of keywords. These metrics are two-sided local metrics and are based on the difference of the distributions of a term in the documents belonging to a class and in the documents not belonging to that class. Moreover, we propose a keyword selection framework called adaptive keyword selection. It is based on selecting different number of terms for each class and it shows significant improvement on skewed datasets that have a limited number of training instances for some of the classes.</abstract><cop>Amsterdam</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2013.02.019</doi><tpages>16</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2013-09, Vol.40 (12), p.4871-4886
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_miscellaneous_1701124612
source	Elsevier ScienceDirect Journals Complete
subjects	Adaptive keyword selection Algorithms Applied sciences Artificial intelligence Categories Computer science control theory systems Data processing. List processing. Character string processing Document categorization Exact sciences and technology Expert systems Feature selection Local and global policies Memory organisation. Data processing Policies Software Speech and sound recognition and synthesis. Linguistics Support vector machines Tasks Texts Weighting
title	Comparison of text feature selection policies and using an adaptive framework
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T08%3A12%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20text%20feature%20selection%20policies%20and%20using%20an%20adaptive%20framework&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Tasci,%20S&rft.date=2013-09-15&rft.volume=40&rft.issue=12&rft.spage=4871&rft.epage=4886&rft.pages=4871-4886&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2013.02.019&rft_dat=%3Cproquest_cross%3E1530986761%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1448714472&rft_id=info:pmid/&rft_els_id=S0957417413001358&rfr_iscdi=true