RamIR: Reasoning and action prompting with Mamba for all-in-one image restoration

All-in-one image restoration aims to recover various degraded images using a unified model. To adaptively reconstruct high-quality images, recent prevalent CNN and Transformer based models incorporate learnable prompts to dynamically acquire degradation-specific knowledge for different degraded imag...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2025-02, Vol.55 (4), p.258, Article 258
Hauptverfasser: Tang, Aiqiang, Wu, Yan, Zhang, Yuwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:All-in-one image restoration aims to recover various degraded images using a unified model. To adaptively reconstruct high-quality images, recent prevalent CNN and Transformer based models incorporate learnable prompts to dynamically acquire degradation-specific knowledge for different degraded images, achieving state-of-the-art restoration performance. However, existing methods exhibit limitations, including high computational burden and inadequate modeling of long-range dependencies. To address these issues, we propose a reasoning and action prompt-driven Mamba-based image restoration model, namely RamIR. Specifically, RamIR employs the Mamba block for long-range dependencies modeling with linear computational complexity relative to the feature map size. Inspired by Chain-of-Thought (CoT) prompting, we integrate Reasoning and Action (ReAct) prompts within the Mamba block. Hence, we utilize the capability of pretrained vision language (PVL) models to generate textual reasoning prompts describing the type and severity of degradations. Simultaneously, another output from PVL acts as action prompt representing the clean image caption. These prompts, employed in a CoT manner, enhance the network’s sensitivity to degradation and elicit targeted recovery actions tailored to different reasoning prompts. Additionally, we explore the seamless interaction between Mamba blocks and prompts, introducing a novel prompt-driven module (PDM) to facilitate prompt utilization. Extensive experimental results demonstrate the superior performance of RamIR, highlighting its advantages in terms of input scaling efficiency over existing benchmark models for all-in-one image restoration.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-024-06226-y