AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cheng, Xiaopei, Wu, Yikai, Hou, Fang, Ding, Sifei
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Cheng, Xiaopei
Wu, Yikai
Hou, Fang
Ding, Sifei
description An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2018365322A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2018365322A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2018365322A13</originalsourceid><addsrcrecordid>eNqNi8sKwjAQRbNxIeo_DLgWbIPS7Zgm7UAekkzBXSkaV6KF-v9YxA9wdbiHe5bihh0Hh0wK9IUjKqbgIRhAmBd58g2oEM9dAhPibGtkBGUxJTKkI5ww6RrmxqFqyWuwGuM3Q9uESNy6tBaL-_CY8ubHldgazard5fHV52kcrvmZ332Xyn1RyeNBliUW8r_XB-8wNRc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><source>esp@cenet</source><creator>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</creator><creatorcontrib>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</creatorcontrib><description>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2018</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20181220&amp;DB=EPODOC&amp;CC=US&amp;NR=2018365322A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,777,882,25545,76296</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20181220&amp;DB=EPODOC&amp;CC=US&amp;NR=2018365322A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Cheng, Xiaopei</creatorcontrib><creatorcontrib>Wu, Yikai</creatorcontrib><creatorcontrib>Hou, Fang</creatorcontrib><creatorcontrib>Ding, Sifei</creatorcontrib><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><description>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2018</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNi8sKwjAQRbNxIeo_DLgWbIPS7Zgm7UAekkzBXSkaV6KF-v9YxA9wdbiHe5bihh0Hh0wK9IUjKqbgIRhAmBd58g2oEM9dAhPibGtkBGUxJTKkI5ww6RrmxqFqyWuwGuM3Q9uESNy6tBaL-_CY8ubHldgazard5fHV52kcrvmZ332Xyn1RyeNBliUW8r_XB-8wNRc</recordid><startdate>20181220</startdate><enddate>20181220</enddate><creator>Cheng, Xiaopei</creator><creator>Wu, Yikai</creator><creator>Hou, Fang</creator><creator>Ding, Sifei</creator><scope>EVB</scope></search><sort><creationdate>20181220</creationdate><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><author>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2018365322A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2018</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Xiaopei</creatorcontrib><creatorcontrib>Wu, Yikai</creatorcontrib><creatorcontrib>Hou, Fang</creatorcontrib><creatorcontrib>Ding, Sifei</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Xiaopei</au><au>Wu, Yikai</au><au>Hou, Fang</au><au>Ding, Sifei</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><date>2018-12-20</date><risdate>2018</risdate><abstract>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US2018365322A1
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T04%3A28%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Cheng,%20Xiaopei&rft.date=2018-12-20&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2018365322A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true