Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, genera...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite substantial progress, all-in-one image restoration (IR) grapples with
persistent challenges in handling intricate real-world degradations. This paper
introduces MPerceiver: a novel multimodal prompt learning approach that
harnesses Stable Diffusion (SD) priors to enhance adaptiveness,
generalizability and fidelity for all-in-one image restoration. Specifically,
we develop a dual-branch module to master two types of SD prompts: textual for
holistic representation and visual for multiscale detail representation. Both
prompts are dynamically adjusted by degradation predictions from the CLIP image
encoder, enabling adaptive responses to diverse unknown degradations. Moreover,
a plug-in detail refinement module improves restoration fidelity via direct
encoder-to-decoder information transformation. To assess our method, MPerceiver
is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art
task-specific methods across most tasks. Post multitask pre-training,
MPerceiver attains a generalized representation in low-level vision, exhibiting
remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive
experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of
adaptiveness, generalizability and fidelity. |
---|---|
DOI: | 10.48550/arxiv.2312.02918 |