Using WordNet to Complement Training Information in Text Categorization

Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 1997-09
Hauptverfasser: Manuel de Buenaga Rodriguez, Gomez Hidalgo, Jose Maria, Belen Diaz Agudo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Manuel de Buenaga Rodriguez
Gomez Hidalgo, Jose Maria
Belen Diaz Agudo
description Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2090461877</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2090461877</sourcerecordid><originalsourceid>FETCH-proquest_journals_20904618773</originalsourceid><addsrcrecordid>eNqNiksKwjAUAIMgWLR3CLgupEl_rou_jauKyxJoWlLa92ryCuLp_eABXA3MzIIFUqk4KhIpVyz0vhdCyCyXaaoCdrx6Cx2_oWsuhjghL3GcBjMaIF45beGTz9CiGzVZBG6BV-ZBvNRkOnT2-dUbtmz14E3445ptD_uqPEWTw_tsPNU9zg7eqZZiJ5IsLvJc_Xe9ABAvO-U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2090461877</pqid></control><display><type>article</type><title>Using WordNet to Complement Training Information in Text Categorization</title><source>Free E- Journals</source><creator>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</creator><creatorcontrib>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</creatorcontrib><description>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Classification ; Machine learning ; Text categorization ; Training</subject><ispartof>arXiv.org, 1997-09</ispartof><rights>1997. This work is published under https://arxiv.org/licenses/assumed-1991-2003/license.html (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Manuel de Buenaga Rodriguez</creatorcontrib><creatorcontrib>Gomez Hidalgo, Jose Maria</creatorcontrib><creatorcontrib>Belen Diaz Agudo</creatorcontrib><title>Using WordNet to Complement Training Information in Text Categorization</title><title>arXiv.org</title><description>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Machine learning</subject><subject>Text categorization</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1997</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNiksKwjAUAIMgWLR3CLgupEl_rou_jauKyxJoWlLa92ryCuLp_eABXA3MzIIFUqk4KhIpVyz0vhdCyCyXaaoCdrx6Cx2_oWsuhjghL3GcBjMaIF45beGTz9CiGzVZBG6BV-ZBvNRkOnT2-dUbtmz14E3445ptD_uqPEWTw_tsPNU9zg7eqZZiJ5IsLvJc_Xe9ABAvO-U</recordid><startdate>19970917</startdate><enddate>19970917</enddate><creator>Manuel de Buenaga Rodriguez</creator><creator>Gomez Hidalgo, Jose Maria</creator><creator>Belen Diaz Agudo</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>19970917</creationdate><title>Using WordNet to Complement Training Information in Text Categorization</title><author>Manuel de Buenaga Rodriguez ; Gomez Hidalgo, Jose Maria ; Belen Diaz Agudo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20904618773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1997</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Machine learning</topic><topic>Text categorization</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Manuel de Buenaga Rodriguez</creatorcontrib><creatorcontrib>Gomez Hidalgo, Jose Maria</creatorcontrib><creatorcontrib>Belen Diaz Agudo</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Manuel de Buenaga Rodriguez</au><au>Gomez Hidalgo, Jose Maria</au><au>Belen Diaz Agudo</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Using WordNet to Complement Training Information in Text Categorization</atitle><jtitle>arXiv.org</jtitle><date>1997-09-17</date><risdate>1997</risdate><eissn>2331-8422</eissn><abstract>Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our approach integrates WordNet information with two training approaches through the Vector Space Model. The training approaches we test are the Rocchio (relevance feedback) and the Widrow-Hoff (machine learning) algorithms. Results obtained from evaluation show that the integration of WordNet clearly outperforms training approaches, and that an integrated technique can effectively address the classification of low frequency categories.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 1997-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_2090461877
source Free E- Journals
subjects Algorithms
Classification
Machine learning
Text categorization
Training
title Using WordNet to Complement Training Information in Text Categorization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A19%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Using%20WordNet%20to%20Complement%20Training%20Information%20in%20Text%20Categorization&rft.jtitle=arXiv.org&rft.au=Manuel%20de%20Buenaga%20Rodriguez&rft.date=1997-09-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2090461877%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2090461877&rft_id=info:pmid/&rfr_iscdi=true