Multimodal based punctuation and/or casing prediction

Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; proces...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bodapati, Sravan Babu, Bekal Kannangola, Dhanush, Kirchhoff, Katrin, Sunkara, Monica Lakshmi, Ronanki, Srikanth
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Bodapati, Sravan Babu
Bekal Kannangola, Dhanush
Kirchhoff, Katrin
Sunkara, Monica Lakshmi
Ronanki, Srikanth
description Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11580965B1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11580965B1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11580965B13</originalsourceid><addsrcrecordid>eNrjZDD1Lc0pyczNT0nMUUhKLE5NUSgozUsuKU0syczPU0jMS9HPL1JITizOzEtXKChKTclMBknwMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL40GBDQ1MLA0szUydDY2LUAACQ3C2u</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Multimodal based punctuation and/or casing prediction</title><source>esp@cenet</source><creator>Bodapati, Sravan Babu ; Bekal Kannangola, Dhanush ; Kirchhoff, Katrin ; Sunkara, Monica Lakshmi ; Ronanki, Srikanth</creator><creatorcontrib>Bodapati, Sravan Babu ; Bekal Kannangola, Dhanush ; Kirchhoff, Katrin ; Sunkara, Monica Lakshmi ; Ronanki, Srikanth</creatorcontrib><description>Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20230214&amp;DB=EPODOC&amp;CC=US&amp;NR=11580965B1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25563,76318</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20230214&amp;DB=EPODOC&amp;CC=US&amp;NR=11580965B1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Bodapati, Sravan Babu</creatorcontrib><creatorcontrib>Bekal Kannangola, Dhanush</creatorcontrib><creatorcontrib>Kirchhoff, Katrin</creatorcontrib><creatorcontrib>Sunkara, Monica Lakshmi</creatorcontrib><creatorcontrib>Ronanki, Srikanth</creatorcontrib><title>Multimodal based punctuation and/or casing prediction</title><description>Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDD1Lc0pyczNT0nMUUhKLE5NUSgozUsuKU0syczPU0jMS9HPL1JITizOzEtXKChKTclMBknwMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL40GBDQ1MLA0szUydDY2LUAACQ3C2u</recordid><startdate>20230214</startdate><enddate>20230214</enddate><creator>Bodapati, Sravan Babu</creator><creator>Bekal Kannangola, Dhanush</creator><creator>Kirchhoff, Katrin</creator><creator>Sunkara, Monica Lakshmi</creator><creator>Ronanki, Srikanth</creator><scope>EVB</scope></search><sort><creationdate>20230214</creationdate><title>Multimodal based punctuation and/or casing prediction</title><author>Bodapati, Sravan Babu ; Bekal Kannangola, Dhanush ; Kirchhoff, Katrin ; Sunkara, Monica Lakshmi ; Ronanki, Srikanth</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11580965B13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2023</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Bodapati, Sravan Babu</creatorcontrib><creatorcontrib>Bekal Kannangola, Dhanush</creatorcontrib><creatorcontrib>Kirchhoff, Katrin</creatorcontrib><creatorcontrib>Sunkara, Monica Lakshmi</creatorcontrib><creatorcontrib>Ronanki, Srikanth</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bodapati, Sravan Babu</au><au>Bekal Kannangola, Dhanush</au><au>Kirchhoff, Katrin</au><au>Sunkara, Monica Lakshmi</au><au>Ronanki, Srikanth</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Multimodal based punctuation and/or casing prediction</title><date>2023-02-14</date><risdate>2023</risdate><abstract>Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US11580965B1
source esp@cenet
subjects ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
title Multimodal based punctuation and/or casing prediction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T12%3A26%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Bodapati,%20Sravan%20Babu&rft.date=2023-02-14&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11580965B1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true