Effective Light Field De-Occlusion Network Based on Swin Transformer

Existing CNN-based light field de-occlusion (LF-DeOcc) methods suffer from occlusion removal performance degradation in the presence of large-size occlusions. In this paper, we infer that it is possibly caused by the limited receptive field of CNN and experimentally demonstrate that the de-occlusion...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2023-06, Vol.33 (6), p.2590-2599
Hauptverfasser:	Wang, Xingzheng, Liu, Jiehao, Chen, Songwei, Wei, Guoyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Convolution convolutional neural network (CNN) Convolutional neural networks Datasets Feature extraction global and local receptive fields Image restoration Light field de-occlusion Object detection Occlusion Performance degradation Performance evaluation Swin transformer Target detection Task analysis Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Existing CNN-based light field de-occlusion (LF-DeOcc) methods suffer from occlusion removal performance degradation in the presence of large-size occlusions. In this paper, we infer that it is possibly caused by the limited receptive field of CNN and experimentally demonstrate that the de-occlusion performance is high-related to the receptive field. Therefore, a novel LF-DeOcc network based on Swin Transformer and CNN, which aims to exploit both global and local receptive fields, is firstly proposed for light field de-occlusion task. CNNs are employed at shallow layers to compensate for the deficiency of Transformers in extracting local features, while Transformers are employed at deep layers to capture the global patterns of large size occlusions. Hence, by integrating the global and local features, one could restore occlusion-free images effectively. For performance evaluation, a large dataset with mild-to-severe occlusions is developed and tested, with average occlusion rates of 19.82%, 32.13% and 40.56%, respectively. Experimental results show that the proposed network is superior to state-of-the-art methods, achieving 27.87 dB and 29.46 dB on the public dataset and our developed dataset, respectively. Finally, a new evaluation method has been presented in our work, i.e., by utilizing the real target detection task to evaluate the performance of LF de-occlusion algorithms. The practicability of our algorithm is validated using the new evaluation method.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2022.3226227