Vertical industry text classification method based on corpus

The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN ZUOHU, LIANG RUIYAN, WANG HUA, LI CE, HE QINGSU, YANG BO, WEI JUN, GUO FANGLIN, WANG QIONG, YANG SHIBO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator CHEN ZUOHU
LIANG RUIYAN
WANG HUA
LI CE
HE QINGSU
YANG BO
WEI JUN
GUO FANGLIN
WANG QIONG
YANG SHIBO
description The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN112784040A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN112784040A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN112784040A3</originalsourceid><addsrcrecordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Vertical industry text classification method based on corpus</title><source>esp@cenet</source><creator>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creator><creatorcontrib>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creatorcontrib><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210511&amp;DB=EPODOC&amp;CC=CN&amp;NR=112784040A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210511&amp;DB=EPODOC&amp;CC=CN&amp;NR=112784040A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><title>Vertical industry text classification method based on corpus</title><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</recordid><startdate>20210511</startdate><enddate>20210511</enddate><creator>CHEN ZUOHU</creator><creator>LIANG RUIYAN</creator><creator>WANG HUA</creator><creator>LI CE</creator><creator>HE QINGSU</creator><creator>YANG BO</creator><creator>WEI JUN</creator><creator>GUO FANGLIN</creator><creator>WANG QIONG</creator><creator>YANG SHIBO</creator><scope>EVB</scope></search><sort><creationdate>20210511</creationdate><title>Vertical industry text classification method based on corpus</title><author>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN112784040A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHEN ZUOHU</au><au>LIANG RUIYAN</au><au>WANG HUA</au><au>LI CE</au><au>HE QINGSU</au><au>YANG BO</au><au>WEI JUN</au><au>GUO FANGLIN</au><au>WANG QIONG</au><au>YANG SHIBO</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Vertical industry text classification method based on corpus</title><date>2021-05-11</date><risdate>2021</risdate><abstract>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN112784040A
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
HANDLING RECORD CARRIERS
PHYSICS
PRESENTATION OF DATA
RECOGNITION OF DATA
RECORD CARRIERS
title Vertical industry text classification method based on corpus
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T20%3A37%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHEN%20ZUOHU&rft.date=2021-05-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN112784040A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true