GAZE-BASED AND AUGMENTED AUTOMATIC INTERPRETATION METHOD AND SYSTEM

The present invention relates to an automatic interpretation method and a system thereof, which convent only voice of a speaker within a viewing range (range of gaze) of a user by utilizing multimodal (voice and video) information input through a smart device in a complex manner into a target langua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	KIM SANG HUN, KIM NAM HYEONG, YUN SEUNG, LEE MIN KYU, BANG JEONG UK
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The present invention relates to an automatic interpretation method and a system thereof, which convent only voice of a speaker within a viewing range (range of gaze) of a user by utilizing multimodal (voice and video) information input through a smart device in a complex manner into a target language. The present invention may significantly improve performance of automatic interpretation with a foreigner to be communicated with, even in a high-noise environment in which multiple speakers utter at the same time by utilizing voice and image information input to a smart device in a complex manner. In addition, the present invention may determine a situation based on text information and image information existing around a user, and reflect the situation information together with multimodal information to an interpretation engine in real time. In addition, the present invention may significantly improve user convenience of an automatic interpretation system by directly augmenting and displaying an interpreted sentence directly next to a speaker image or generating a synthesized sound by distinguishing the interpreted sentence from other speeches. 본 발명은 스마트기기를 통해 입력되는 멀티모달(음성과 영상) 정보를 복합적으로 활용하여 사용자가 바라보는 범위(시선 범위) 내 발화자에 대한 음성만을 목적 언어로 변환하는 자동통역 방법 및 시스템에 관한 것이다. 본 발명은 스마트기기에 입력되는 음성과 영상 정보를 복합적으로 활용하여 다수 화자가 동시에 발성하는 고잡음 환경에서도 의사소통 대상 외국인과의 자동통역 성능을 대폭 향상시킬 수 있다. 또한, 본 발명은 사용자 주변에 존재하는 텍스트 정보 및 영상 정보에 기반하여 상황을 판단하고, 상황 정보를 멀티모달 정보와 함께 통역엔진에 실시간 반영할 수 있다. 또한, 본 발명은 발화자 영상 옆에 직접 통역 문장을 증강하여 표시하거나 다른 발화와 구분하여 합성음을 생성함으로써 자동통역 시스템 사용자의 편의성을 획기적으로 개선할 수 있다.