Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization

Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Montañés, E., Combarro, E. F., Díaz, I., Ranilla, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 248
container_issue
container_start_page 239
container_title
container_volume
creator Montañés, E.
Combarro, E. F.
Díaz, I.
Ranilla, J.
description Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.
doi_str_mv 10.1007/11552253_22
format Conference Proceeding
fullrecord <record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_17115860</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>17115860</sourcerecordid><originalsourceid>FETCH-LOGICAL-p219t-8b614bbdbd08152fe571c623a1c41044d164f77e2fe9b95a1968bd2236adeb883</originalsourceid><addsrcrecordid>eNpNkEtLAzEcxOMLrLUnv0AuHjys5p93jqVYFQo9WE8elmQ3W6Lb3SWJz09vS0U8DcwMA_ND6ALINRCibgCEoFSwktIDNDFKM8EJA0OlOUQjkAAFY9wc_WVUKyPUMRoRRmhhFGen6CylF0IIVYaO0POq_7CxTnj6lvuNzaHCtqvxcshhY1s8D232MXRrvPDvvk246SOee5vfosePvvVVDn2HQ4dX_jPjmc1-3cfwbXf2OTppbJv85FfH6Gl-u5rdF4vl3cNsuigGCiYX2kngztWuJhoEbbxQUEnKLFQcCOc1SN4o5beJcUZYMFK7mlImbe2d1myMLve7g02VbZtouyqkcojbC_GrBLXFpiXZ9q72vTTsLvlYur5_TSWQcke3_EeX_QDDk2dA</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization</title><source>Springer Books</source><creator>Montañés, E. ; Combarro, E. F. ; Díaz, I. ; Ranilla, J.</creator><contributor>Peña, José M. ; Siebes, Arno ; Feelders, Ad ; Famili, A. Fazel ; Kok, Joost N.</contributor><creatorcontrib>Montañés, E. ; Combarro, E. F. ; Díaz, I. ; Ranilla, J. ; Peña, José M. ; Siebes, Arno ; Feelders, Ad ; Famili, A. Fazel ; Kok, Joost N.</creatorcontrib><description>Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540287957</identifier><identifier>ISBN: 3540287957</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540319269</identifier><identifier>EISBN: 3540319263</identifier><identifier>DOI: 10.1007/11552253_22</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Data processing. List processing. Character string processing ; Exact sciences and technology ; Feature Selection ; Feature Subset ; Information Gain ; Information Retrieval ; Information systems. Data bases ; Memory organisation. Data processing ; Software ; Speech and sound recognition and synthesis. Linguistics ; Target Function</subject><ispartof>Advances in Intelligent Data Analysis VI, 2005, p.239-248</ispartof><rights>Springer-Verlag Berlin Heidelberg 2005</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11552253_22$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11552253_22$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=17115860$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Peña, José M.</contributor><contributor>Siebes, Arno</contributor><contributor>Feelders, Ad</contributor><contributor>Famili, A. Fazel</contributor><contributor>Kok, Joost N.</contributor><creatorcontrib>Montañés, E.</creatorcontrib><creatorcontrib>Combarro, E. F.</creatorcontrib><creatorcontrib>Díaz, I.</creatorcontrib><creatorcontrib>Ranilla, J.</creatorcontrib><title>Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization</title><title>Advances in Intelligent Data Analysis VI</title><description>Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Data processing. List processing. Character string processing</subject><subject>Exact sciences and technology</subject><subject>Feature Selection</subject><subject>Feature Subset</subject><subject>Information Gain</subject><subject>Information Retrieval</subject><subject>Information systems. Data bases</subject><subject>Memory organisation. Data processing</subject><subject>Software</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Target Function</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540287957</isbn><isbn>3540287957</isbn><isbn>9783540319269</isbn><isbn>3540319263</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpNkEtLAzEcxOMLrLUnv0AuHjys5p93jqVYFQo9WE8elmQ3W6Lb3SWJz09vS0U8DcwMA_ND6ALINRCibgCEoFSwktIDNDFKM8EJA0OlOUQjkAAFY9wc_WVUKyPUMRoRRmhhFGen6CylF0IIVYaO0POq_7CxTnj6lvuNzaHCtqvxcshhY1s8D232MXRrvPDvvk246SOee5vfosePvvVVDn2HQ4dX_jPjmc1-3cfwbXf2OTppbJv85FfH6Gl-u5rdF4vl3cNsuigGCiYX2kngztWuJhoEbbxQUEnKLFQcCOc1SN4o5beJcUZYMFK7mlImbe2d1myMLve7g02VbZtouyqkcojbC_GrBLXFpiXZ9q72vTTsLvlYur5_TSWQcke3_EeX_QDDk2dA</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Montañés, E.</creator><creator>Combarro, E. F.</creator><creator>Díaz, I.</creator><creator>Ranilla, J.</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2005</creationdate><title>Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization</title><author>Montañés, E. ; Combarro, E. F. ; Díaz, I. ; Ranilla, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p219t-8b614bbdbd08152fe571c623a1c41044d164f77e2fe9b95a1968bd2236adeb883</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Data processing. List processing. Character string processing</topic><topic>Exact sciences and technology</topic><topic>Feature Selection</topic><topic>Feature Subset</topic><topic>Information Gain</topic><topic>Information Retrieval</topic><topic>Information systems. Data bases</topic><topic>Memory organisation. Data processing</topic><topic>Software</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Target Function</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Montañés, E.</creatorcontrib><creatorcontrib>Combarro, E. F.</creatorcontrib><creatorcontrib>Díaz, I.</creatorcontrib><creatorcontrib>Ranilla, J.</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Montañés, E.</au><au>Combarro, E. F.</au><au>Díaz, I.</au><au>Ranilla, J.</au><au>Peña, José M.</au><au>Siebes, Arno</au><au>Feelders, Ad</au><au>Famili, A. Fazel</au><au>Kok, Joost N.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization</atitle><btitle>Advances in Intelligent Data Analysis VI</btitle><date>2005</date><risdate>2005</risdate><spage>239</spage><epage>248</epage><pages>239-248</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540287957</isbn><isbn>3540287957</isbn><eisbn>9783540319269</eisbn><eisbn>3540319263</eisbn><abstract>Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11552253_22</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0302-9743
ispartof Advances in Intelligent Data Analysis VI, 2005, p.239-248
issn 0302-9743
1611-3349
language eng
recordid cdi_pascalfrancis_primary_17115860
source Springer Books
subjects Applied sciences
Artificial intelligence
Computer science
control theory
systems
Data processing. List processing. Character string processing
Exact sciences and technology
Feature Selection
Feature Subset
Information Gain
Information Retrieval
Information systems. Data bases
Memory organisation. Data processing
Software
Speech and sound recognition and synthesis. Linguistics
Target Function
title Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T02%3A54%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Towards%20Automatic%20and%20Optimal%20Filtering%20Levels%20for%20Feature%20Selection%20in%20Text%20Categorization&rft.btitle=Advances%20in%20Intelligent%20Data%20Analysis%20VI&rft.au=Monta%C3%B1%C3%A9s,%20E.&rft.date=2005&rft.spage=239&rft.epage=248&rft.pages=239-248&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540287957&rft.isbn_list=3540287957&rft_id=info:doi/10.1007/11552253_22&rft_dat=%3Cpascalfrancis_sprin%3E17115860%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540319269&rft.eisbn_list=3540319263&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true