Multi-person long speech semantic recognition and abstract generation method, system and device and medium

The invention provides a multi-person long voice semantic recognition and abstract generation method, system and device and a medium, and the method comprises the steps: carrying out the voice signal enhancement of a noisy voice signal through a Deducs model, and obtaining a real voice signal; voice...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LEI FANGCHAO, SONG MINGZHEN, WANG JIATING, XU CHUANZHAO, LIU WEIFENG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	LEI FANGCHAO SONG MINGZHEN WANG JIATING XU CHUANZHAO LIU WEIFENG
description	The invention provides a multi-person long voice semantic recognition and abstract generation method, system and device and a medium, and the method comprises the steps: carrying out the voice signal enhancement of a noisy voice signal through a Deducs model, and obtaining a real voice signal; voiceprint features are extracted from the real voice signals through an ERes2Net model, a personal voiceprint library is formed based on the voiceprint features, and identities of different speakers are recognized; inputting a real voice signal into the trained Conformer end-to-end model for voice content recognition, matching the identity of a speaker, and converting the speaking content into a text; carrying out abstract generation on the speaking content text through a large language model; according to the method, the context information of the long text can be effectively captured, the attention degree of local information is high, and the technical problem that technical terms or contexts in the specific field ar
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118262705A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118262705A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118262705A3</originalsourceid><addsrcrecordid>eNqNzLsKwkAUBNA0FqL-w7U3YCI-WgmKjVb2Yd0dk5Xsg703gn9vCH6A1QzDYabZ69p3YvOIxMFTF3xDHAHdEsMpL1ZTgg6Nt2IHoLwh9WBJSgs18Ehq3B2kDWZF_GGBG5nB22qM1cHY3s2zyVN1jMUvZ9nyfLpXlxwx1OCo9PAndXUrikO5K_fr7XHzj_kCvRRBOQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Multi-person long speech semantic recognition and abstract generation method, system and device and medium</title><source>esp@cenet</source><creator>LEI FANGCHAO ; SONG MINGZHEN ; WANG JIATING ; XU CHUANZHAO ; LIU WEIFENG</creator><creatorcontrib>LEI FANGCHAO ; SONG MINGZHEN ; WANG JIATING ; XU CHUANZHAO ; LIU WEIFENG</creatorcontrib><description>The invention provides a multi-person long voice semantic recognition and abstract generation method, system and device and a medium, and the method comprises the steps: carrying out the voice signal enhancement of a noisy voice signal through a Deducs model, and obtaining a real voice signal; voiceprint features are extracted from the real voice signals through an ERes2Net model, a personal voiceprint library is formed based on the voiceprint features, and identities of different speakers are recognized; inputting a real voice signal into the trained Conformer end-to-end model for voice content recognition, matching the identity of a speaker, and converting the speaking content into a text; carrying out abstract generation on the speaking content text through a large language model; according to the method, the context information of the long text can be effectively captured, the attention degree of local information is high, and the technical problem that technical terms or contexts in the specific field ar</description><language>chi ; eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240628&DB=EPODOC&CC=CN&NR=118262705A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240628&DB=EPODOC&CC=CN&NR=118262705A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>LEI FANGCHAO</creatorcontrib><creatorcontrib>SONG MINGZHEN</creatorcontrib><creatorcontrib>WANG JIATING</creatorcontrib><creatorcontrib>XU CHUANZHAO</creatorcontrib><creatorcontrib>LIU WEIFENG</creatorcontrib><title>Multi-person long speech semantic recognition and abstract generation method, system and device and medium</title><description>The invention provides a multi-person long voice semantic recognition and abstract generation method, system and device and a medium, and the method comprises the steps: carrying out the voice signal enhancement of a noisy voice signal through a Deducs model, and obtaining a real voice signal; voiceprint features are extracted from the real voice signals through an ERes2Net model, a personal voiceprint library is formed based on the voiceprint features, and identities of different speakers are recognized; inputting a real voice signal into the trained Conformer end-to-end model for voice content recognition, matching the identity of a speaker, and converting the speaking content into a text; carrying out abstract generation on the speaking content text through a large language model; according to the method, the context information of the long text can be effectively captured, the attention degree of local information is high, and the technical problem that technical terms or contexts in the specific field ar</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNzLsKwkAUBNA0FqL-w7U3YCI-WgmKjVb2Yd0dk5Xsg703gn9vCH6A1QzDYabZ69p3YvOIxMFTF3xDHAHdEsMpL1ZTgg6Nt2IHoLwh9WBJSgs18Ehq3B2kDWZF_GGBG5nB22qM1cHY3s2zyVN1jMUvZ9nyfLpXlxwx1OCo9PAndXUrikO5K_fr7XHzj_kCvRRBOQ</recordid><startdate>20240628</startdate><enddate>20240628</enddate><creator>LEI FANGCHAO</creator><creator>SONG MINGZHEN</creator><creator>WANG JIATING</creator><creator>XU CHUANZHAO</creator><creator>LIU WEIFENG</creator><scope>EVB</scope></search><sort><creationdate>20240628</creationdate><title>Multi-person long speech semantic recognition and abstract generation method, system and device and medium</title><author>LEI FANGCHAO ; SONG MINGZHEN ; WANG JIATING ; XU CHUANZHAO ; LIU WEIFENG</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118262705A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>LEI FANGCHAO</creatorcontrib><creatorcontrib>SONG MINGZHEN</creatorcontrib><creatorcontrib>WANG JIATING</creatorcontrib><creatorcontrib>XU CHUANZHAO</creatorcontrib><creatorcontrib>LIU WEIFENG</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>LEI FANGCHAO</au><au>SONG MINGZHEN</au><au>WANG JIATING</au><au>XU CHUANZHAO</au><au>LIU WEIFENG</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Multi-person long speech semantic recognition and abstract generation method, system and device and medium</title><date>2024-06-28</date><risdate>2024</risdate><abstract>The invention provides a multi-person long voice semantic recognition and abstract generation method, system and device and a medium, and the method comprises the steps: carrying out the voice signal enhancement of a noisy voice signal through a Deducs model, and obtaining a real voice signal; voiceprint features are extracted from the real voice signals through an ERes2Net model, a personal voiceprint library is formed based on the voiceprint features, and identities of different speakers are recognized; inputting a real voice signal into the trained Conformer end-to-end model for voice content recognition, matching the identity of a speaker, and converting the speaking content into a text; carrying out abstract generation on the speaking content text through a large language model; according to the method, the context information of the long text can be effectively captured, the attention degree of local information is high, and the technical problem that technical terms or contexts in the specific field ar</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN118262705A
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Multi-person long speech semantic recognition and abstract generation method, system and device and medium
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T20%3A48%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=LEI%20FANGCHAO&rft.date=2024-06-28&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118262705A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true