Multi-modal fusion detection method for deeply-forged audio and video

The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: PENG XUEKANG, LIAN ZHICHAO, WANG SHUJUAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cross-modal joint learning decoder and a multi-modal classification detector to carry out audio-video multi-modal identification; the time sequence-space feature extractor is responsible for performing unified processing on audio and video modal features; the cross attention cross-modal joint learning decoder enables the two types of modal information to perform joint learning through two parallel decoders; and the multi-modal classification detector outputs a binary classification result by fusing the feature information of the two. According to the method, the complementarity of audio-image modals is utilized, whether videos which are not easily perceived by human beings are forged or not can be distinguished, the