Multi-modal fusion detection method for deeply-forged audio and video
The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cr...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a multi-modal fusion detection method for deeply forged audios and videos, which belongs to the field of multi-modal machine learning, and comprises the following steps: constructing a network architecture by using a time sequence-space feature extractor, a cross attention cross-modal joint learning decoder and a multi-modal classification detector to carry out audio-video multi-modal identification; the time sequence-space feature extractor is responsible for performing unified processing on audio and video modal features; the cross attention cross-modal joint learning decoder enables the two types of modal information to perform joint learning through two parallel decoders; and the multi-modal classification detector outputs a binary classification result by fusing the feature information of the two. According to the method, the complementarity of audio-image modals is utilized, whether videos which are not easily perceived by human beings are forged or not can be distinguished, the |
---|