UP-Net: Uncertainty-supervised Parallel Network for Image Manipulation Localization

Image manipulation localization remains a hot topic due to its inherent semantic-independent nature and realistic needs. Virtually all localization studies are devoted to solving arbitrary tampering using multi-branch networks based on deep features or skip-connection structures based on full featur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2023-11, Vol.33 (11), p.1-1
Hauptverfasser: Xu, Dengyun, Shen, Xuanjing, Lyu, Yingda
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Image manipulation localization remains a hot topic due to its inherent semantic-independent nature and realistic needs. Virtually all localization studies are devoted to solving arbitrary tampering using multi-branch networks based on deep features or skip-connection structures based on full features, which may induce the loss of manipulation details or noisy interference from image semantics. This poses a challenge for existing localization methods to fully capture invisible manipulations, especially in post-processing settings and across dataset scenarios. To address the above issues, we propose an uncertainty-supervised parallel network (UP-Net) for image tampering localization that preserves more manipulation details while avoiding semantic noise. UP-Net cascades the frequency and RGB domains of the manipulated image as dual-domain embedding, instead of dual-domain parallel learning as in previous work. To learn semantic-independent manipulation features, two structurally identical parallel branches are designed to learn tampering inconsistencies from intermediate and deep coding features for gradually obtaining the initial and final localization predictions. Where attention-guided partial decoder (AGPD) integrates more precise manipulation edges and manipulation semantics without introducing additional noise by focusing on channel correlation and spatial dependence, making a significant contribution to performance. Moreover, the new concept of uncertainty-constrained loss supervision is introduced to guide UP-Net to continuously improve confidence in locating difficult pixels, which are easily misclassified due to post-processing operations. Experiments on three public manipulation datasets and two real challenge datasets show that our end-to-end UP-Net achieves significant performance in manipulation localization, generalization across datasets, and robustness compared to state-of-the-art methods.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2023.3269948