SPEECH TRANSCRIPTION USING MULTIPLE DATA SOURCES

This disclosure describes transcribing speech using audio, image, and other data. A system is described that includes an audio capture system configured to capture audio data associated with a plurality of speakers, an image capture system configured to capture images of one or more of the plurality...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHEUNG, Vincent Charles, BAI, Chengxuan, SHENG, Yating Sasha
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	ACOUSTICS ELECTRIC COMMUNICATION TECHNIQUE ELECTRICITY MUSICAL INSTRUMENTS OPTICAL ELEMENTS, SYSTEMS, OR APPARATUS OPTICS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This disclosure describes transcribing speech using audio, image, and other data. A system is described that includes an audio capture system configured to capture audio data associated with a plurality of speakers, an image capture system configured to capture images of one or more of the plurality of speakers, and a speech processing engine. The speech processing engine may be configured to recognize a plurality of speech segments in the audio data, identify, for each speech segment of the plurality of speech segments and based on the images, a speaker associated with the speech segment, transcribe each of the plurality of speech segments to produce a transcription of the plurality of speech segments including, for each speech segment in the plurality of speech segments, an indication of the speaker associated with the speech segment, and analyze the transcription to produce additional data derived from the transcription.