Bad corpus filtering method and system

The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHENG KAILIN, ZHOU YUHAN, LIU KAI, JIANG XIAONING, XIE HONGMIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator CHENG KAILIN
ZHOU YUHAN
LIU KAI
JIANG XIAONING
XIE HONGMIN
description The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN115544204A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN115544204A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN115544204A3</originalsourceid><addsrcrecordid>eNrjZFBzSkxRSM4vKigtVkjLzClJLcrMS1fITS3JyE9RSMxLUSiuLC5JzeVhYE1LzClO5YXS3AyKbq4hzh66qQX58anFBYnJqXmpJfHOfoaGpqYmJkYGJo7GxKgBAF6mJ2E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Bad corpus filtering method and system</title><source>esp@cenet</source><creator>CHENG KAILIN ; ZHOU YUHAN ; LIU KAI ; JIANG XIAONING ; XIE HONGMIN</creator><creatorcontrib>CHENG KAILIN ; ZHOU YUHAN ; LIU KAI ; JIANG XIAONING ; XIE HONGMIN</creatorcontrib><description>The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20221230&amp;DB=EPODOC&amp;CC=CN&amp;NR=115544204A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76516</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20221230&amp;DB=EPODOC&amp;CC=CN&amp;NR=115544204A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHENG KAILIN</creatorcontrib><creatorcontrib>ZHOU YUHAN</creatorcontrib><creatorcontrib>LIU KAI</creatorcontrib><creatorcontrib>JIANG XIAONING</creatorcontrib><creatorcontrib>XIE HONGMIN</creatorcontrib><title>Bad corpus filtering method and system</title><description>The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZFBzSkxRSM4vKigtVkjLzClJLcrMS1fITS3JyE9RSMxLUSiuLC5JzeVhYE1LzClO5YXS3AyKbq4hzh66qQX58anFBYnJqXmpJfHOfoaGpqYmJkYGJo7GxKgBAF6mJ2E</recordid><startdate>20221230</startdate><enddate>20221230</enddate><creator>CHENG KAILIN</creator><creator>ZHOU YUHAN</creator><creator>LIU KAI</creator><creator>JIANG XIAONING</creator><creator>XIE HONGMIN</creator><scope>EVB</scope></search><sort><creationdate>20221230</creationdate><title>Bad corpus filtering method and system</title><author>CHENG KAILIN ; ZHOU YUHAN ; LIU KAI ; JIANG XIAONING ; XIE HONGMIN</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN115544204A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2022</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHENG KAILIN</creatorcontrib><creatorcontrib>ZHOU YUHAN</creatorcontrib><creatorcontrib>LIU KAI</creatorcontrib><creatorcontrib>JIANG XIAONING</creatorcontrib><creatorcontrib>XIE HONGMIN</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHENG KAILIN</au><au>ZHOU YUHAN</au><au>LIU KAI</au><au>JIANG XIAONING</au><au>XIE HONGMIN</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Bad corpus filtering method and system</title><date>2022-12-30</date><risdate>2022</risdate><abstract>The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN115544204A
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
HANDLING RECORD CARRIERS
PHYSICS
PRESENTATION OF DATA
RECOGNITION OF DATA
RECORD CARRIERS
title Bad corpus filtering method and system
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T03%3A40%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHENG%20KAILIN&rft.date=2022-12-30&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN115544204A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true