Multi-modal fusion detection method for deeply-forged audio and video

The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	PENG XUEKANG, LIAN ZHICHAO, WANG SHUJUAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cross-modal joint learning decoder and a multi-modal classification detector to carry out audio-video multi-modal identification; the time sequence-space feature extractor is responsible for performing unified processing on audio and video modal features; the cross attention cross-modal joint learning decoder enables the two types of modal information to perform joint learning through two parallel decoders; and the multi-modal classification detector outputs a binary classification result by fusing the feature information of the two. According to the method, the complementarity of audio-image modals is utilized, whether videos which are not easily perceived by human beings are forged or not can be distinguished, the