Making good choices of non-redundant n-gramwords
A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 71 |
---|---|
container_issue | |
container_start_page | 64 |
container_title | |
container_volume | |
creator | Moura, Maria Fernanda Nogueira, Bruno Magalhaes da Silva Conrado, Merley dos Santos, Fabiano Fernandes Rezende, Solange Oliveira |
description | A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n-gram words are statistically generated and selected, which always implies in redundancy. The proposed method eliminates only the redundancies. This can be observed by the results of classifiers over the original and the non redundant data sets, because, there is not a decrease in the categorization effectiveness. Additionally, the method is useful for any kind of machine learning process applied to a text mining task. |
doi_str_mv | 10.1109/ICCITECHN.2008.4803111 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4803111</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4803111</ieee_id><sourcerecordid>4803111</sourcerecordid><originalsourceid>FETCH-ieee_primary_48031113</originalsourceid><addsrcrecordid>eNp9jsEKgkAURSdCKMsvCGJ-QHvPGVPXYuSiVu5l0NGsnImZIvr7CoR23c3lcM_iErJGCBAh3RRZVpR5tj8GIUAS8AQYIk6IizzkPES2jaY_iGKHuF8xBYiBzYhn7Rk-4RFLMJoTOIhLrzraad3Q-qT7WlqqW6q08o1sHqoR6k6V3xkxPLVp7JI4rbha6Y29IKtdXmZ7v5dSVjfTD8K8qvEW-7--ASZ0OAk</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Making good choices of non-redundant n-gramwords</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Moura, Maria Fernanda ; Nogueira, Bruno Magalhaes ; da Silva Conrado, Merley ; dos Santos, Fabiano Fernandes ; Rezende, Solange Oliveira</creator><creatorcontrib>Moura, Maria Fernanda ; Nogueira, Bruno Magalhaes ; da Silva Conrado, Merley ; dos Santos, Fabiano Fernandes ; Rezende, Solange Oliveira</creatorcontrib><description>A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n-gram words are statistically generated and selected, which always implies in redundancy. The proposed method eliminates only the redundancies. This can be observed by the results of classifiers over the original and the non redundant data sets, because, there is not a decrease in the categorization effectiveness. Additionally, the method is useful for any kind of machine learning process applied to a text mining task.</description><identifier>ISBN: 1424421357</identifier><identifier>ISBN: 9781424421350</identifier><identifier>EISBN: 1424421365</identifier><identifier>EISBN: 9781424421367</identifier><identifier>DOI: 10.1109/ICCITECHN.2008.4803111</identifier><identifier>LCCN: 2008900703</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial intelligence ; Data mining ; Decision making ; Frequency estimation ; Machine learning ; Manuals ; Mathematics ; Proposals ; Supervised learning ; Text mining</subject><ispartof>2008 11th International Conference on Computer and Information Technology, 2008, p.64-71</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4803111$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>310,311,781,785,790,791,2059,27930,54925</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4803111$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Moura, Maria Fernanda</creatorcontrib><creatorcontrib>Nogueira, Bruno Magalhaes</creatorcontrib><creatorcontrib>da Silva Conrado, Merley</creatorcontrib><creatorcontrib>dos Santos, Fabiano Fernandes</creatorcontrib><creatorcontrib>Rezende, Solange Oliveira</creatorcontrib><title>Making good choices of non-redundant n-gramwords</title><title>2008 11th International Conference on Computer and Information Technology</title><addtitle>ICCITECHN</addtitle><description>A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n-gram words are statistically generated and selected, which always implies in redundancy. The proposed method eliminates only the redundancies. This can be observed by the results of classifiers over the original and the non redundant data sets, because, there is not a decrease in the categorization effectiveness. Additionally, the method is useful for any kind of machine learning process applied to a text mining task.</description><subject>Artificial intelligence</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Frequency estimation</subject><subject>Machine learning</subject><subject>Manuals</subject><subject>Mathematics</subject><subject>Proposals</subject><subject>Supervised learning</subject><subject>Text mining</subject><isbn>1424421357</isbn><isbn>9781424421350</isbn><isbn>1424421365</isbn><isbn>9781424421367</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNp9jsEKgkAURSdCKMsvCGJ-QHvPGVPXYuSiVu5l0NGsnImZIvr7CoR23c3lcM_iErJGCBAh3RRZVpR5tj8GIUAS8AQYIk6IizzkPES2jaY_iGKHuF8xBYiBzYhn7Rk-4RFLMJoTOIhLrzraad3Q-qT7WlqqW6q08o1sHqoR6k6V3xkxPLVp7JI4rbha6Y29IKtdXmZ7v5dSVjfTD8K8qvEW-7--ASZ0OAk</recordid><startdate>200812</startdate><enddate>200812</enddate><creator>Moura, Maria Fernanda</creator><creator>Nogueira, Bruno Magalhaes</creator><creator>da Silva Conrado, Merley</creator><creator>dos Santos, Fabiano Fernandes</creator><creator>Rezende, Solange Oliveira</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200812</creationdate><title>Making good choices of non-redundant n-gramwords</title><author>Moura, Maria Fernanda ; Nogueira, Bruno Magalhaes ; da Silva Conrado, Merley ; dos Santos, Fabiano Fernandes ; Rezende, Solange Oliveira</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_48031113</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Artificial intelligence</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Frequency estimation</topic><topic>Machine learning</topic><topic>Manuals</topic><topic>Mathematics</topic><topic>Proposals</topic><topic>Supervised learning</topic><topic>Text mining</topic><toplevel>online_resources</toplevel><creatorcontrib>Moura, Maria Fernanda</creatorcontrib><creatorcontrib>Nogueira, Bruno Magalhaes</creatorcontrib><creatorcontrib>da Silva Conrado, Merley</creatorcontrib><creatorcontrib>dos Santos, Fabiano Fernandes</creatorcontrib><creatorcontrib>Rezende, Solange Oliveira</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Moura, Maria Fernanda</au><au>Nogueira, Bruno Magalhaes</au><au>da Silva Conrado, Merley</au><au>dos Santos, Fabiano Fernandes</au><au>Rezende, Solange Oliveira</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Making good choices of non-redundant n-gramwords</atitle><btitle>2008 11th International Conference on Computer and Information Technology</btitle><stitle>ICCITECHN</stitle><date>2008-12</date><risdate>2008</risdate><spage>64</spage><epage>71</epage><pages>64-71</pages><isbn>1424421357</isbn><isbn>9781424421350</isbn><eisbn>1424421365</eisbn><eisbn>9781424421367</eisbn><abstract>A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n-gram words are statistically generated and selected, which always implies in redundancy. The proposed method eliminates only the redundancies. This can be observed by the results of classifiers over the original and the non redundant data sets, because, there is not a decrease in the categorization effectiveness. Additionally, the method is useful for any kind of machine learning process applied to a text mining task.</abstract><pub>IEEE</pub><doi>10.1109/ICCITECHN.2008.4803111</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 1424421357 |
ispartof | 2008 11th International Conference on Computer and Information Technology, 2008, p.64-71 |
issn | |
language | eng |
recordid | cdi_ieee_primary_4803111 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Artificial intelligence Data mining Decision making Frequency estimation Machine learning Manuals Mathematics Proposals Supervised learning Text mining |
title | Making good choices of non-redundant n-gramwords |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T16%3A59%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Making%20good%20choices%20of%20non-redundant%20n-gramwords&rft.btitle=2008%2011th%20International%20Conference%20on%20Computer%20and%20Information%20Technology&rft.au=Moura,%20Maria%20Fernanda&rft.date=2008-12&rft.spage=64&rft.epage=71&rft.pages=64-71&rft.isbn=1424421357&rft.isbn_list=9781424421350&rft_id=info:doi/10.1109/ICCITECHN.2008.4803111&rft_dat=%3Cieee_6IE%3E4803111%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424421365&rft.eisbn_list=9781424421367&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4803111&rfr_iscdi=true |