Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features
Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 185 |
---|---|
container_issue | |
container_start_page | 182 |
container_title | |
container_volume | 1 |
creator | Liuling Dai Jinwu Hu WanChun Liu |
description | Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising. |
doi_str_mv | 10.1109/ISCID.2008.178 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4725586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4725586</ieee_id><sourcerecordid>4725586</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-d950337a7d5369dd20732b1c3ff25111a8551ab03f6a5ae9ac609fd051e60dd53</originalsourceid><addsrcrecordid>eNotjM1KAzEURgNS0NZu3bi5L9CaOzHJZCmjtQMtQn92QrltbtqIzuhMBq1Pb0W_zYED5xPiCuUYUbqbclmU9-NMynyMNj8TfWmN00ohmp7o_3qX5WjMuRi27Ys8TTmVW30hntdtrPYwr30MkT0U0xKWHx01DFR5WNTd_gBLThDqBlb8laCgxPu6id-UYl3BZ0wHmFN1hAX7rvJUJZgwpa7h9lL0Ar22PPznQKwnD6tiOpo9PZbF3WwU0eo08k5LpSxZr5Vx3mfSqmyLOxVCphGRcq2RtlIFQ5rY0c5IF7zUyEb6UzQQ13-_kZk37018o-a4ubWZ1rlRP50SUlA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Liuling Dai ; Jinwu Hu ; WanChun Liu</creator><creatorcontrib>Liuling Dai ; Jinwu Hu ; WanChun Liu</creatorcontrib><description>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</description><identifier>ISBN: 0769533116</identifier><identifier>ISBN: 9780769533117</identifier><identifier>DOI: 10.1109/ISCID.2008.178</identifier><identifier>LCCN: 2008928166</identifier><language>eng</language><publisher>IEEE</publisher><subject>Competitive intelligence ; Computational intelligence ; Computer science ; feature selection ; Information retrieval ; Information technology ; Laboratories ; Machine learning algorithms ; Partial response channels ; rough set ; Support vector machines ; SVM ; Text categorization</subject><ispartof>2008 International Symposium on Computational Intelligence and Design, 2008, Vol.1, p.182-185</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4725586$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4725586$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liuling Dai</creatorcontrib><creatorcontrib>Jinwu Hu</creatorcontrib><creatorcontrib>WanChun Liu</creatorcontrib><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><title>2008 International Symposium on Computational Intelligence and Design</title><addtitle>ISCID</addtitle><description>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</description><subject>Competitive intelligence</subject><subject>Computational intelligence</subject><subject>Computer science</subject><subject>feature selection</subject><subject>Information retrieval</subject><subject>Information technology</subject><subject>Laboratories</subject><subject>Machine learning algorithms</subject><subject>Partial response channels</subject><subject>rough set</subject><subject>Support vector machines</subject><subject>SVM</subject><subject>Text categorization</subject><isbn>0769533116</isbn><isbn>9780769533117</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjM1KAzEURgNS0NZu3bi5L9CaOzHJZCmjtQMtQn92QrltbtqIzuhMBq1Pb0W_zYED5xPiCuUYUbqbclmU9-NMynyMNj8TfWmN00ohmp7o_3qX5WjMuRi27Ys8TTmVW30hntdtrPYwr30MkT0U0xKWHx01DFR5WNTd_gBLThDqBlb8laCgxPu6id-UYl3BZ0wHmFN1hAX7rvJUJZgwpa7h9lL0Ar22PPznQKwnD6tiOpo9PZbF3WwU0eo08k5LpSxZr5Vx3mfSqmyLOxVCphGRcq2RtlIFQ5rY0c5IF7zUyEb6UzQQ13-_kZk37018o-a4ubWZ1rlRP50SUlA</recordid><startdate>200810</startdate><enddate>200810</enddate><creator>Liuling Dai</creator><creator>Jinwu Hu</creator><creator>WanChun Liu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200810</creationdate><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><author>Liuling Dai ; Jinwu Hu ; WanChun Liu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-d950337a7d5369dd20732b1c3ff25111a8551ab03f6a5ae9ac609fd051e60dd53</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Competitive intelligence</topic><topic>Computational intelligence</topic><topic>Computer science</topic><topic>feature selection</topic><topic>Information retrieval</topic><topic>Information technology</topic><topic>Laboratories</topic><topic>Machine learning algorithms</topic><topic>Partial response channels</topic><topic>rough set</topic><topic>Support vector machines</topic><topic>SVM</topic><topic>Text categorization</topic><toplevel>online_resources</toplevel><creatorcontrib>Liuling Dai</creatorcontrib><creatorcontrib>Jinwu Hu</creatorcontrib><creatorcontrib>WanChun Liu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liuling Dai</au><au>Jinwu Hu</au><au>WanChun Liu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</atitle><btitle>2008 International Symposium on Computational Intelligence and Design</btitle><stitle>ISCID</stitle><date>2008-10</date><risdate>2008</risdate><volume>1</volume><spage>182</spage><epage>185</epage><pages>182-185</pages><isbn>0769533116</isbn><isbn>9780769533117</isbn><abstract>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</abstract><pub>IEEE</pub><doi>10.1109/ISCID.2008.178</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 0769533116 |
ispartof | 2008 International Symposium on Computational Intelligence and Design, 2008, Vol.1, p.182-185 |
issn | |
language | eng |
recordid | cdi_ieee_primary_4725586 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Competitive intelligence Computational intelligence Computer science feature selection Information retrieval Information technology Laboratories Machine learning algorithms Partial response channels rough set Support vector machines SVM Text categorization |
title | Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T09%3A01%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Using%20Modified%20CHI%20Square%20and%20Rough%20Set%20for%20Text%20Categorization%20with%20Many%20Redundant%20Features&rft.btitle=2008%20International%20Symposium%20on%20Computational%20Intelligence%20and%20Design&rft.au=Liuling%20Dai&rft.date=2008-10&rft.volume=1&rft.spage=182&rft.epage=185&rft.pages=182-185&rft.isbn=0769533116&rft.isbn_list=9780769533117&rft_id=info:doi/10.1109/ISCID.2008.178&rft_dat=%3Cieee_6IE%3E4725586%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4725586&rfr_iscdi=true |