Creating taxonomies and training data in multiple languages

The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generatin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHENG, KEH-SHIN FU, GATES, STEPHEN C
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator CHENG, KEH-SHIN FU
GATES, STEPHEN C
description The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_TW200519645A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>TW200519645A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_TW200519645A3</originalsourceid><addsrcrecordid>eNrjZLB2LkpNLMnMS1coSazIz8vPzUwtVkjMS1EoKUrMzAOJpySWJCpk5inkluaUZBbkpCrkJOallyampxbzMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL4kHAjAwNTQ0szE1NHY2LUAADK8y_N</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Creating taxonomies and training data in multiple languages</title><source>esp@cenet</source><creator>CHENG, KEH-SHIN FU ; GATES, STEPHEN C</creator><creatorcontrib>CHENG, KEH-SHIN FU ; GATES, STEPHEN C</creatorcontrib><description>The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2005</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20050616&amp;DB=EPODOC&amp;CC=TW&amp;NR=200519645A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20050616&amp;DB=EPODOC&amp;CC=TW&amp;NR=200519645A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHENG, KEH-SHIN FU</creatorcontrib><creatorcontrib>GATES, STEPHEN C</creatorcontrib><title>Creating taxonomies and training data in multiple languages</title><description>The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2005</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLB2LkpNLMnMS1coSazIz8vPzUwtVkjMS1EoKUrMzAOJpySWJCpk5inkluaUZBbkpCrkJOallyampxbzMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL4kHAjAwNTQ0szE1NHY2LUAADK8y_N</recordid><startdate>20050616</startdate><enddate>20050616</enddate><creator>CHENG, KEH-SHIN FU</creator><creator>GATES, STEPHEN C</creator><scope>EVB</scope></search><sort><creationdate>20050616</creationdate><title>Creating taxonomies and training data in multiple languages</title><author>CHENG, KEH-SHIN FU ; GATES, STEPHEN C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_TW200519645A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2005</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHENG, KEH-SHIN FU</creatorcontrib><creatorcontrib>GATES, STEPHEN C</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHENG, KEH-SHIN FU</au><au>GATES, STEPHEN C</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Creating taxonomies and training data in multiple languages</title><date>2005-06-16</date><risdate>2005</risdate><abstract>The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_TW200519645A
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title Creating taxonomies and training data in multiple languages
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T17%3A38%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHENG,%20KEH-SHIN%20FU&rft.date=2005-06-16&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ETW200519645A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true