Using WordNet to Complement Training Information in Text Categorization
Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the...
Gespeichert in:
Veröffentlicht in: | arXiv.org 1997-09 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Manuel de Buenaga Rodriguez Gomez Hidalgo, Jose Maria Belen Diaz Agudo |
description | Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2090461877</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2090461877</sourcerecordid><originalsourceid>FETCH-proquest_journals_20904618773</originalsourceid><addsrcrecordid>eNqNiksKwjAUAIMgWLR3CLgupEl_rou_jauKyxJoWlLa92ryCuLp_eABXA3MzIIFUqk4KhIpVyz0vhdCyCyXaaoCdrx6Cx2_oWsuhjghL3GcBjMaIF45beGTz9CiGzVZBG6BV-ZBvNRkOnT2-dUbtmz14E3445ptD_uqPEWTw_tsPNU9zg7eqZZiJ5IsLvJc_Xe9ABAvO-U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2090461877</pqid></control><display><type>article</type><title>Using WordNet to Complement Training Information in Text Categorization</title><source>Free E- Journals</source><creator>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</creator><creatorcontrib>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</creatorcontrib><description>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Classification ; Machine learning ; Text categorization ; Training</subject><ispartof>arXiv.org, 1997-09</ispartof><rights>1997. This work is published under https://arxiv.org/licenses/assumed-1991-2003/license.html (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Manuel de Buenaga Rodriguez</creatorcontrib><creatorcontrib>Gomez Hidalgo, Jose Maria</creatorcontrib><creatorcontrib>Belen Diaz Agudo</creatorcontrib><title>Using WordNet to Complement Training Information in Text Categorization</title><title>arXiv.org</title><description>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Machine learning</subject><subject>Text categorization</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1997</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNiksKwjAUAIMgWLR3CLgupEl_rou_jauKyxJoWlLa92ryCuLp_eABXA3MzIIFUqk4KhIpVyz0vhdCyCyXaaoCdrx6Cx2_oWsuhjghL3GcBjMaIF45beGTz9CiGzVZBG6BV-ZBvNRkOnT2-dUbtmz14E3445ptD_uqPEWTw_tsPNU9zg7eqZZiJ5IsLvJc_Xe9ABAvO-U</recordid><startdate>19970917</startdate><enddate>19970917</enddate><creator>Manuel de Buenaga Rodriguez</creator><creator>Gomez Hidalgo, Jose Maria</creator><creator>Belen Diaz Agudo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>19970917</creationdate><title>Using WordNet to Complement Training Information in Text Categorization</title><author>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20904618773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1997</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Machine learning</topic><topic>Text categorization</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Manuel de Buenaga Rodriguez</creatorcontrib><creatorcontrib>Gomez Hidalgo, Jose Maria</creatorcontrib><creatorcontrib>Belen Diaz Agudo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Manuel de Buenaga Rodriguez</au><au>Gomez Hidalgo, Jose Maria</au><au>Belen Diaz Agudo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Using WordNet to Complement Training Information in Text Categorization</atitle><jtitle>arXiv.org</jtitle><date>1997-09-17</date><risdate>1997</risdate><eissn>2331-8422</eissn><abstract>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 1997-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2090461877 |
source | Free E- Journals |
subjects | Algorithms Classification Machine learning Text categorization Training |
title | Using WordNet to Complement Training Information in Text Categorization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A19%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Using%20WordNet%20to%20Complement%20Training%20Information%20in%20Text%20Categorization&rft.jtitle=arXiv.org&rft.au=Manuel%20de%20Buenaga%20Rodriguez&rft.date=1997-09-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2090461877%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2090461877&rft_id=info:pmid/&rfr_iscdi=true |