PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement

Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2024-04, Vol.241, p.103930, Article 103930
Hauptverfasser: Dang, Jiachen, Zhong, Yong, Qin, Xiaolin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However, the high computational cost of self-attention limits its development in low-light image enhancement, while some works struggle to balance accuracy and computational cost. In this work, we propose a lightweight and effective network based on the proposed pixel-wise and patch-wise cross-attention mechanism, PPformer, for low-light image enhancement. PPformer is a CNN-transformer hybrid network that is divided into three parts: local-branch, global-branch, and Dual Cross-Attention. Each part plays a vital role in PPformer. Specifically, the local-branch extracts local structural information using a stack of Wide Enhancement Modules, and the global-branch provides the refining global information by Cross Patch Module and Global Convolution Module. Besides, different from self-attention, we use extracted global semantic information to guide modeling dependencies between local and non-local. According to calculating Dual Cross-Attention, the PPformer can effectively restore images with better color consistency, natural brightness and contrast. Benefiting from the proposed dual cross-attention mechanism, PPformer effectively captures the dependencies in both pixel and patch levels for a full-size feature map. Extensive experiments on eleven real-world benchmark datasets show that PPformer achieves better quantitative and qualitative results than previous state-of-the-art methods. •We proposed a lightweight model PPformer to extract information in both local and non-local for LLIE adaptively.•We proposed a dual cross-attention mechanism for efficiency and accuracy.•PPformer with only 95k parameters achieves promising results on eleven low-light image enhancement datasets.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2024.103930