Vertical industry text classification method based on corpus
The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | CHEN ZUOHU LIANG RUIYAN WANG HUA LI CE HE QINGSU YANG BO WEI JUN GUO FANGLIN WANG QIONG YANG SHIBO |
description | The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance.
本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。 |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN112784040A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN112784040A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN112784040A3</originalsourceid><addsrcrecordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Vertical industry text classification method based on corpus</title><source>esp@cenet</source><creator>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creator><creatorcontrib>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</creatorcontrib><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance.
本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210511&DB=EPODOC&CC=CN&NR=112784040A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76289</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210511&DB=EPODOC&CC=CN&NR=112784040A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><title>Vertical industry text classification method based on corpus</title><description>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance.
本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLAJSy0qyUxOzFHIzEspLS4pqlQoSa0oUUjOSSwuzkwDypRk5ucp5KaWZOSnKCQlFqemKAD5yflFBaXFPAysaYk5xam8UJqbQdHNNcTZQze1ID8-tbggMTk1L7Uk3tnP0NDI3MLEwMTA0ZgYNQAoCDA_</recordid><startdate>20210511</startdate><enddate>20210511</enddate><creator>CHEN ZUOHU</creator><creator>LIANG RUIYAN</creator><creator>WANG HUA</creator><creator>LI CE</creator><creator>HE QINGSU</creator><creator>YANG BO</creator><creator>WEI JUN</creator><creator>GUO FANGLIN</creator><creator>WANG QIONG</creator><creator>YANG SHIBO</creator><scope>EVB</scope></search><sort><creationdate>20210511</creationdate><title>Vertical industry text classification method based on corpus</title><author>CHEN ZUOHU ; LIANG RUIYAN ; WANG HUA ; LI CE ; HE QINGSU ; YANG BO ; WEI JUN ; GUO FANGLIN ; WANG QIONG ; YANG SHIBO</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN112784040A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>CHEN ZUOHU</creatorcontrib><creatorcontrib>LIANG RUIYAN</creatorcontrib><creatorcontrib>WANG HUA</creatorcontrib><creatorcontrib>LI CE</creatorcontrib><creatorcontrib>HE QINGSU</creatorcontrib><creatorcontrib>YANG BO</creatorcontrib><creatorcontrib>WEI JUN</creatorcontrib><creatorcontrib>GUO FANGLIN</creatorcontrib><creatorcontrib>WANG QIONG</creatorcontrib><creatorcontrib>YANG SHIBO</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>CHEN ZUOHU</au><au>LIANG RUIYAN</au><au>WANG HUA</au><au>LI CE</au><au>HE QINGSU</au><au>YANG BO</au><au>WEI JUN</au><au>GUO FANGLIN</au><au>WANG QIONG</au><au>YANG SHIBO</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Vertical industry text classification method based on corpus</title><date>2021-05-11</date><risdate>2021</risdate><abstract>The invention discloses a vertical industry text classification method based on a corpus. The method comprises the steps: firstly constructing a vertical industry parent corpus, then constructing different sub-corpora for different types of text data in the vertical industry, carrying out clustering on words in each sub-corpus so as to form a more precise corpus, calculating the similarity between the newly added vertical industry text data and the corpus data one by one, and classifying vertical industry text. The method is simple, easy to implement and good in efficiency and performance.
本发明公开了基于语料库的垂直行业文本分类方法,通过首先构建一个垂直行业父语料库,然后针对垂直行业内不同类型的文本数据分别构建不同的子语料库,并对各个子语料库中的单词进行聚类,形成更加精准的语料库。逐一计算新添加垂直行业文本数据和各个语料库数据之间的相似度,从而对垂直行业文本进行分类,本方法简单、易于实现,且效率和性能较好。</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | chi ; eng |
recordid | cdi_epo_espacenet_CN112784040A |
source | esp@cenet |
subjects | CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS |
title | Vertical industry text classification method based on corpus |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T20%3A37%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=CHEN%20ZUOHU&rft.date=2021-05-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN112784040A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |