AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cheng, Xiaopei, Wu, Yikai, Hou, Fang, Ding, Sifei
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Cheng, Xiaopei Wu, Yikai Hou, Fang Ding, Sifei
description	An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2018365322A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2018365322A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2018365322A13</originalsourceid><addsrcrecordid>eNqNi8sKwjAQRbNxIeo_DLgWbIPS7Zgm7UAekkzBXSkaV6KF-v9YxA9wdbiHe5bihh0Hh0wK9IUjKqbgIRhAmBd58g2oEM9dAhPibGtkBGUxJTKkI5ww6RrmxqFqyWuwGuM3Q9uESNy6tBaL-_CY8ubHldgazard5fHV52kcrvmZ332Xyn1RyeNBliUW8r_XB-8wNRc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><source>esp@cenet</source><creator>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</creator><creatorcontrib>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</creatorcontrib><description>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2018</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20181220&DB=EPODOC&CC=US&NR=2018365322A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,777,882,25545,76296</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20181220&DB=EPODOC&CC=US&NR=2018365322A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Cheng, Xiaopei</creatorcontrib><creatorcontrib>Wu, Yikai</creatorcontrib><creatorcontrib>Hou, Fang</creatorcontrib><creatorcontrib>Ding, Sifei</creatorcontrib><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><description>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2018</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNi8sKwjAQRbNxIeo_DLgWbIPS7Zgm7UAekkzBXSkaV6KF-v9YxA9wdbiHe5bihh0Hh0wK9IUjKqbgIRhAmBd58g2oEM9dAhPibGtkBGUxJTKkI5ww6RrmxqFqyWuwGuM3Q9uESNy6tBaL-_CY8ubHldgazard5fHV52kcrvmZ332Xyn1RyeNBliUW8r_XB-8wNRc</recordid><startdate>20181220</startdate><enddate>20181220</enddate><creator>Cheng, Xiaopei</creator><creator>Wu, Yikai</creator><creator>Hou, Fang</creator><creator>Ding, Sifei</creator><scope>EVB</scope></search><sort><creationdate>20181220</creationdate><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><author>Cheng, Xiaopei ; Wu, Yikai ; Hou, Fang ; Ding, Sifei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2018365322A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2018</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Xiaopei</creatorcontrib><creatorcontrib>Wu, Yikai</creatorcontrib><creatorcontrib>Hou, Fang</creatorcontrib><creatorcontrib>Ding, Sifei</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Xiaopei</au><au>Wu, Yikai</au><au>Hou, Fang</au><au>Ding, Sifei</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS</title><date>2018-12-20</date><risdate>2018</risdate><abstract>An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US2018365322A1
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
title	AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T04%3A28%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Cheng,%20Xiaopei&rft.date=2018-12-20&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2018365322A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true