Diarization Driven by Meta-Information Identified in Discussion Content

An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Fousek, Petr, Novak, Miroslav, Saon, George A, Church, Kenneth W, Dimitriadis, Dimitrios B
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Fousek, Petr Novak, Miroslav Saon, George A Church, Kenneth W Dimitriadis, Dimitrios B
description	An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US2019156835A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US2019156835A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US2019156835A13</originalsourceid><addsrcrecordid>eNrjZHB3yUwsyqxKLMnMz1NwKcosS81TSKpU8E0tSdT1zEvLL8qFSHmmpOaVZKZlpqYoZAIVZhYnlxYXgySc8_NKgFI8DKxpiTnFqbxQmptB2c01xNlDN7UgPz61uCAxOTUvtSQ-NNjIwNDS0NTMwtjU0dCYOFUAK0c02A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Diarization Driven by Meta-Information Identified in Discussion Content</title><source>esp@cenet</source><creator>Fousek, Petr ; Novak, Miroslav ; Saon, George A ; Church, Kenneth W ; Dimitriadis, Dimitrios B</creator><creatorcontrib>Fousek, Petr ; Novak, Miroslav ; Saon, George A ; Church, Kenneth W ; Dimitriadis, Dimitrios B</creatorcontrib><description>An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2019</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190523&DB=EPODOC&CC=US&NR=2019156835A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25544,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20190523&DB=EPODOC&CC=US&NR=2019156835A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Fousek, Petr</creatorcontrib><creatorcontrib>Novak, Miroslav</creatorcontrib><creatorcontrib>Saon, George A</creatorcontrib><creatorcontrib>Church, Kenneth W</creatorcontrib><creatorcontrib>Dimitriadis, Dimitrios B</creatorcontrib><title>Diarization Driven by Meta-Information Identified in Discussion Content</title><description>An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2019</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZHB3yUwsyqxKLMnMz1NwKcosS81TSKpU8E0tSdT1zEvLL8qFSHmmpOaVZKZlpqYoZAIVZhYnlxYXgySc8_NKgFI8DKxpiTnFqbxQmptB2c01xNlDN7UgPz61uCAxOTUvtSQ-NNjIwNDS0NTMwtjU0dCYOFUAK0c02A</recordid><startdate>20190523</startdate><enddate>20190523</enddate><creator>Fousek, Petr</creator><creator>Novak, Miroslav</creator><creator>Saon, George A</creator><creator>Church, Kenneth W</creator><creator>Dimitriadis, Dimitrios B</creator><scope>EVB</scope></search><sort><creationdate>20190523</creationdate><title>Diarization Driven by Meta-Information Identified in Discussion Content</title><author>Fousek, Petr ; Novak, Miroslav ; Saon, George A ; Church, Kenneth W ; Dimitriadis, Dimitrios B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US2019156835A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2019</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Fousek, Petr</creatorcontrib><creatorcontrib>Novak, Miroslav</creatorcontrib><creatorcontrib>Saon, George A</creatorcontrib><creatorcontrib>Church, Kenneth W</creatorcontrib><creatorcontrib>Dimitriadis, Dimitrios B</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fousek, Petr</au><au>Novak, Miroslav</au><au>Saon, George A</au><au>Church, Kenneth W</au><au>Dimitriadis, Dimitrios B</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Diarization Driven by Meta-Information Identified in Discussion Content</title><date>2019-05-23</date><risdate>2019</risdate><abstract>An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US2019156835A1
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Diarization Driven by Meta-Information Identified in Discussion Content
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T17%3A26%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Fousek,%20Petr&rft.date=2019-05-23&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS2019156835A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true