Learning Frame-Event Fusion for Motion Deblurring

Motion deblurring is a highly ill-posed problem due to the significant loss of motion information in the blurring process. Complementary informative features from auxiliary sensors such as event cameras can be explored for guiding motion deblurring. The event camera can capture rich motion informati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2024, Vol.33, p.6836-6849
Hauptverfasser: Yang, Wen, Wu, Jinjian, Ma, Jupo, Li, Leida, Dong, Weisheng, Shi, Guangming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Motion deblurring is a highly ill-posed problem due to the significant loss of motion information in the blurring process. Complementary informative features from auxiliary sensors such as event cameras can be explored for guiding motion deblurring. The event camera can capture rich motion information asynchronously with microsecond accuracy. In this paper, a novel frame-event fusion framework is proposed for event-driven motion deblurring (FEF-Deblur), which can sufficiently explore long-range cross-modal information interactions. Firstly, different modalities are usually complementary and also redundant. Cross-modal fusion is modeled as complementary-unique features separation-and-aggregation, avoiding the modality redundancy. Unique features and complementary features are first inferred with parallel intra-modal self-attention and inter-modal cross-attention respectively. After that, a correlation-based constraint is designed to act between unique and complementary features to facilitate their differentiation, which assists in cross-modal redundancy suppression. Additionally, spatio-temporal dependencies among neighboring inputs are crucial for motion deblurring. A recurrent cross attention is introduced to preserve inter-input attention information, in which the current spatial features and aggregated temporal features are attending to each other by establishing the long-range interaction between them. Extensive experiments on both synthetic and real-world motion deblurring datasets demonstrate our method outperforms state-of-the-art event-based and image/video-based methods.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2024.3512362