An Efficient Multiscale Spatial Rearrangement MLP Architecture for Image Restoration

The effective use of long-range information can yield improved network performance, which is very important for image restoration. Although local window-based models have linear complexity and can be feasibly applied to process high-resolution images, a single-scale window has a limited receptive fi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2024, Vol.33, p.423-438
Hauptverfasser: Hua, Xia, Li, Zezheng, Hong, Hanyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The effective use of long-range information can yield improved network performance, which is very important for image restoration. Although local window-based models have linear complexity and can be feasibly applied to process high-resolution images, a single-scale window has a limited receptive field and is less efficient for encoding long-range context information. To address this issue, this paper presents a single-stage multiscale spatial rearrangement multilayer perceptron (MSSR-MLP) architecture that can obtain information at different scales within a local window. Specifically, we propose a simple and efficient spatial rearrangement module (SRM) that moves information outside the local window to the inside of the local window so that long-range dependencies can be modeled using only a window-based fully connected (FC) layer. The SRM can extend the local receptive field of a window-based FC layer without introducing additional parameters and FLOPs. Utilizing several spatial rearrangement modules with different step sizes, we design an efficient multiscale spatial rearrangement MLP architecture for image restoration. This design aggregates multiscale information to achieve improved restoration quality while maintaining a low computational cost. Extensive experiments conducted on several image restoration tasks demonstrate the efficiency and effectiveness of our method. For example, it requires only ~4.3% of the FLOPs needed by SwinIR for Gaussian gray image denoising, ~13.9% of the FLOPs needed by \mathrm {C^{2}} PNet for single-image dehazing and ~18.9% of the FLOPs needed by MAXIM for single-image motion deblurring but achieves better performance on each of these restoration tasks.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2023.3341700