다수의 데이터 소스들을 사용한 스피치 전사

본 개시내용은 오디오, 이미지, 및 다른 데이터를 사용하여 스피치를 전사하는 것을 설명한다. 복수의 화자들과 연관된 오디오 데이터를 캡처하도록 구성된 오디오 캡처 시스템, 복수의 화자들 중 하나 이상의 이미지들을 캡처하도록 구성된 이미지 캡처 시스템, 및 스피치 프로세싱 엔진을 포함하는 시스템이 설명된다. 스피치 프로세싱 엔진은 오디오 데이터에서 복수의 스피치 세그먼트를 인식하고, 복수의 스피치 세그먼트들의 각 스피치 세그먼트에 대해 그리고 이미지들에 기반하여, 스피치 세그먼트와 연관된 화자를 식별하고, 복수의 스피치 세그먼트들의...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHEUNG VINCENT CHARLES, SHENG YATING SASHA, BAI CHENGXUAN
Format:	Patent
Sprache:	kor
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC COMMUNICATION TECHNIQUE ELECTRICITY IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS OPTICAL ELEMENTS, SYSTEMS, OR APPARATUS OPTICS PHYSICS PICTORIAL COMMUNICATION, e.g. TELEVISION SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	본 개시내용은 오디오, 이미지, 및 다른 데이터를 사용하여 스피치를 전사하는 것을 설명한다. 복수의 화자들과 연관된 오디오 데이터를 캡처하도록 구성된 오디오 캡처 시스템, 복수의 화자들 중 하나 이상의 이미지들을 캡처하도록 구성된 이미지 캡처 시스템, 및 스피치 프로세싱 엔진을 포함하는 시스템이 설명된다. 스피치 프로세싱 엔진은 오디오 데이터에서 복수의 스피치 세그먼트를 인식하고, 복수의 스피치 세그먼트들의 각 스피치 세그먼트에 대해 그리고 이미지들에 기반하여, 스피치 세그먼트와 연관된 화자를 식별하고, 복수의 스피치 세그먼트들의 각각의 스피치 세그먼트에 대해, 스피치 세그먼트와 연관된 화자의 표시를 포함하는 복수의 스피치 세그먼트들의 전사를 생성하기 위해 복수의 스피치 세그먼트들 각각을 전사하고, 전사로부터 도출된 추가 데이터를 생성하기 위해 전사를 분석하도록 구성될 수 있다. This disclosure describes transcribing speech using audio, image, and other data. A system is described that includes an audio capture system configured to capture audio data associated with a plurality of speakers, an image capture system configured to capture images of one or more of the plurality of speakers, and a speech processing engine. The speech processing engine may be configured to recognize a plurality of speech segments in the audio data, identify, for each speech segment of the plurality of speech segments and based on the images, a speaker associated with the speech segment, transcribe each of the plurality of speech segments to produce a transcription of the plurality of speech segments including, for each speech segment in the plurality of speech segments, an indication of the speaker associated with the speech segment, and analyze the transcription to produce additional data derived from the transcription.