Speaker tracking method and system based on multi-modal information

The invention discloses a spokesman tracking method and system based on multi-modal information, and relates to the field of spokesman tracking. The method can be applied to online spokesman tracking tasks of offline conferences or online conferences, spokesmen can be quickly and accurately position...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FAN SHENGXU, TIAN JIANKUN, ZHANG DEYUAN, LIU TAO, DU XIAOYONG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a spokesman tracking method and system based on multi-modal information, and relates to the field of spokesman tracking. The method can be applied to online spokesman tracking tasks of offline conferences or online conferences, spokesmen can be quickly and accurately positioned, and spokesman close-up can be given; and the method can also be used for marking off-line tasks of spokesmen in each part of the video in the provided video. And under the condition that a plurality of faces appear in the same picture and each person alternately speaks, calculating a speaking lip movement score, a sound and appearance matching score and a lip shape synchronization score of each face in the image by using the input image and the corresponding audio information, and positioning a specific spokesman according to the score of each face in the image. And meanwhile, the voice and face pairs which are registered and paired are supported to be input in advance, and the voice and face pairs with high pa