Vertical industry text classification method based on corpus

The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHEN ZUOHU, LIANG RUIYAN, WANG HUA, LI CE, HE QINGSU, YANG BO, WEI JUN, GUO FANGLIN, WANG QIONG, YANG SHIBO
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	CHEN ZUOHU LIANG RUIYAN WANG HUA LI CE HE QINGSU YANG BO WEI JUN GUO FANGLIN WANG QIONG YANG SHIBO
description	The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法，通过首先构建一个垂直行业父语料库，然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库，并对各个子语料库中的单词进行聚类，形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度，从而对垂直行业文本进行分类，本方法简单、易于实现，且效率和性能较好。
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN112784040A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN112784040A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN112784040A3</originalsourceid><addsrcrecordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Vertical industry text classification method based on corpus</title><source>esp@cenet</source><creator>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creator><creatorcontrib>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creatorcontrib><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法，通过首先构建一个垂直行业父语料库，然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库，并对各个子语料库中的单词进行聚类，形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度，从而对垂直行业文本进行分类，本方法简单、易于实现，且效率和性能较好。</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210511&DB=EPODOC&CC=CN&NR=112784040A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210511&DB=EPODOC&CC=CN&NR=112784040A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><title>Vertical industry text classification method based on corpus</title><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法，通过首先构建一个垂直行业父语料库，然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库，并对各个子语料库中的单词进行聚类，形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度，从而对垂直行业文本进行分类，本方法简单、易于实现，且效率和性能较好。</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</recordid><startdate>20210511</startdate><enddate>20210511</enddate><creator>CHEN ZUOHU</creator><creator>LIANG RUIYAN</creator><creator>WANG HUA</creator><creator>LI CE</creator><creator>HE QINGSU</creator><creator>YANG BO</creator><creator>WEI JUN</creator><creator>GUO FANGLIN</creator><creator>WANG QIONG</creator><creator>YANG SHIBO</creator><scope>EVB</scope></search><sort><creationdate>20210511</creationdate><title>Vertical industry text classification method based on corpus</title><author>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN112784040A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHEN ZUOHU</au><au>LIANG RUIYAN</au><au>WANG HUA</au><au>LI CE</au><au>HE QINGSU</au><au>YANG BO</au><au>WEI JUN</au><au>GUO FANGLIN</au><au>WANG QIONG</au><au>YANG SHIBO</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Vertical industry text classification method based on corpus</title><date>2021-05-11</date><risdate>2021</risdate><abstract>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance. 本发明公开了基于语料库的垂直行业文本分类方法，通过首先构建一个垂直行业父语料库，然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库，并对各个子语料库中的单词进行聚类，形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度，从而对垂直行业文本进行分类，本方法简单、易于实现，且效率和性能较好。</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN112784040A
source	esp@cenet
subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
title	Vertical industry text classification method based on corpus
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T20%3A37%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHEN%20ZUOHU&rft.date=2021-05-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN112784040A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true