PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement
Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for l...
Gespeichert in:
Veröffentlicht in: | Computer vision and image understanding 2024-04, Vol.241, p.103930, Article 103930 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However, the high computational cost of self-attention limits its development in low-light image enhancement, while some works struggle to balance accuracy and computational cost. In this work, we propose a lightweight and effective network based on the proposed pixel-wise and patch-wise cross-attention mechanism, PPformer, for low-light image enhancement. PPformer is a CNN-transformer hybrid network that is divided into three parts: local-branch, global-branch, and Dual Cross-Attention. Each part plays a vital role in PPformer. Specifically, the local-branch extracts local structural information using a stack of Wide Enhancement Modules, and the global-branch provides the refining global information by Cross Patch Module and Global Convolution Module. Besides, different from self-attention, we use extracted global semantic information to guide modeling dependencies between local and non-local. According to calculating Dual Cross-Attention, the PPformer can effectively restore images with better color consistency, natural brightness and contrast. Benefiting from the proposed dual cross-attention mechanism, PPformer effectively captures the dependencies in both pixel and patch levels for a full-size feature map. Extensive experiments on eleven real-world benchmark datasets show that PPformer achieves better quantitative and qualitative results than previous state-of-the-art methods.
•We proposed a lightweight model PPformer to extract information in both local and non-local for LLIE adaptively.•We proposed a dual cross-attention mechanism for efficiency and accuracy.•PPformer with only 95k parameters achieves promising results on eleven low-light image enhancement datasets. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2024.103930 |