Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features

Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liuling Dai, Jinwu Hu, WanChun Liu
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 185
container_issue
container_start_page 182
container_title
container_volume 1
creator Liuling Dai
Jinwu Hu
WanChun Liu
description Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.
doi_str_mv 10.1109/ISCID.2008.178
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4725586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4725586</ieee_id><sourcerecordid>4725586</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-d950337a7d5369dd20732b1c3ff25111a8551ab03f6a5ae9ac609fd051e60dd53</originalsourceid><addsrcrecordid>eNotjM1KAzEURgNS0NZu3bi5L9CaOzHJZCmjtQMtQn92QrltbtqIzuhMBq1Pb0W_zYED5xPiCuUYUbqbclmU9-NMynyMNj8TfWmN00ohmp7o_3qX5WjMuRi27Ys8TTmVW30hntdtrPYwr30MkT0U0xKWHx01DFR5WNTd_gBLThDqBlb8laCgxPu6id-UYl3BZ0wHmFN1hAX7rvJUJZgwpa7h9lL0Ar22PPznQKwnD6tiOpo9PZbF3WwU0eo08k5LpSxZr5Vx3mfSqmyLOxVCphGRcq2RtlIFQ5rY0c5IF7zUyEb6UzQQ13-_kZk37018o-a4ubWZ1rlRP50SUlA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Liuling Dai ; Jinwu Hu ; WanChun Liu</creator><creatorcontrib>Liuling Dai ; Jinwu Hu ; WanChun Liu</creatorcontrib><description>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</description><identifier>ISBN: 0769533116</identifier><identifier>ISBN: 9780769533117</identifier><identifier>DOI: 10.1109/ISCID.2008.178</identifier><identifier>LCCN: 2008928166</identifier><language>eng</language><publisher>IEEE</publisher><subject>Competitive intelligence ; Computational intelligence ; Computer science ; feature selection ; Information retrieval ; Information technology ; Laboratories ; Machine learning algorithms ; Partial response channels ; rough set ; Support vector machines ; SVM ; Text categorization</subject><ispartof>2008 International Symposium on Computational Intelligence and Design, 2008, Vol.1, p.182-185</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4725586$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4725586$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liuling Dai</creatorcontrib><creatorcontrib>Jinwu Hu</creatorcontrib><creatorcontrib>WanChun Liu</creatorcontrib><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><title>2008 International Symposium on Computational Intelligence and Design</title><addtitle>ISCID</addtitle><description>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</description><subject>Competitive intelligence</subject><subject>Computational intelligence</subject><subject>Computer science</subject><subject>feature selection</subject><subject>Information retrieval</subject><subject>Information technology</subject><subject>Laboratories</subject><subject>Machine learning algorithms</subject><subject>Partial response channels</subject><subject>rough set</subject><subject>Support vector machines</subject><subject>SVM</subject><subject>Text categorization</subject><isbn>0769533116</isbn><isbn>9780769533117</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjM1KAzEURgNS0NZu3bi5L9CaOzHJZCmjtQMtQn92QrltbtqIzuhMBq1Pb0W_zYED5xPiCuUYUbqbclmU9-NMynyMNj8TfWmN00ohmp7o_3qX5WjMuRi27Ys8TTmVW30hntdtrPYwr30MkT0U0xKWHx01DFR5WNTd_gBLThDqBlb8laCgxPu6id-UYl3BZ0wHmFN1hAX7rvJUJZgwpa7h9lL0Ar22PPznQKwnD6tiOpo9PZbF3WwU0eo08k5LpSxZr5Vx3mfSqmyLOxVCphGRcq2RtlIFQ5rY0c5IF7zUyEb6UzQQ13-_kZk37018o-a4ubWZ1rlRP50SUlA</recordid><startdate>200810</startdate><enddate>200810</enddate><creator>Liuling Dai</creator><creator>Jinwu Hu</creator><creator>WanChun Liu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200810</creationdate><title>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</title><author>Liuling Dai ; Jinwu Hu ; WanChun Liu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-d950337a7d5369dd20732b1c3ff25111a8551ab03f6a5ae9ac609fd051e60dd53</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Competitive intelligence</topic><topic>Computational intelligence</topic><topic>Computer science</topic><topic>feature selection</topic><topic>Information retrieval</topic><topic>Information technology</topic><topic>Laboratories</topic><topic>Machine learning algorithms</topic><topic>Partial response channels</topic><topic>rough set</topic><topic>Support vector machines</topic><topic>SVM</topic><topic>Text categorization</topic><toplevel>online_resources</toplevel><creatorcontrib>Liuling Dai</creatorcontrib><creatorcontrib>Jinwu Hu</creatorcontrib><creatorcontrib>WanChun Liu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liuling Dai</au><au>Jinwu Hu</au><au>WanChun Liu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features</atitle><btitle>2008 International Symposium on Computational Intelligence and Design</btitle><stitle>ISCID</stitle><date>2008-10</date><risdate>2008</risdate><volume>1</volume><spage>182</spage><epage>185</epage><pages>182-185</pages><isbn>0769533116</isbn><isbn>9780769533117</isbn><abstract>Text categorization is a key problem of text mining. Although there are many research on this problem, the main works are focused on classification of big categories. There are very few researches on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive manner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.</abstract><pub>IEEE</pub><doi>10.1109/ISCID.2008.178</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 0769533116
ispartof 2008 International Symposium on Computational Intelligence and Design, 2008, Vol.1, p.182-185
issn
language eng
recordid cdi_ieee_primary_4725586
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Competitive intelligence
Computational intelligence
Computer science
feature selection
Information retrieval
Information technology
Laboratories
Machine learning algorithms
Partial response channels
rough set
Support vector machines
SVM
Text categorization
title Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T09%3A01%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Using%20Modified%20CHI%20Square%20and%20Rough%20Set%20for%20Text%20Categorization%20with%20Many%20Redundant%20Features&rft.btitle=2008%20International%20Symposium%20on%20Computational%20Intelligence%20and%20Design&rft.au=Liuling%20Dai&rft.date=2008-10&rft.volume=1&rft.spage=182&rft.epage=185&rft.pages=182-185&rft.isbn=0769533116&rft.isbn_list=9780769533117&rft_id=info:doi/10.1109/ISCID.2008.178&rft_dat=%3Cieee_6IE%3E4725586%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4725586&rfr_iscdi=true