Human Body Segmentation in Wide-Angle Images Based on Fast Vision Transformers
Achieving effective and efficient segmentation of human body regions in distorted images is of practical significance. Current methods rely on transformers to extract discriminative features. However, due to the unique global attention mechanism, existing transformers lack detailed image features an...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.178971-178981 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Achieving effective and efficient segmentation of human body regions in distorted images is of practical significance. Current methods rely on transformers to extract discriminative features. However, due to the unique global attention mechanism, existing transformers lack detailed image features and incur high computational costs, resulting in subpar segmentation accuracy and slow inference speed. In this paper, we introduce the Human Spatial Prior Module (HSPM) and Dynamic Token Pruning Module (DTPM). The HSPM is specifically designed to capture human features in distorted images, using dynamic methods to extract highly variable details. The DTPM accelerates inference by pruning unimportant tokens from each layer of the Vision Transformer (ViT). Unlike traditional cropping approaches, the cropped tokens are preserved using feature maps and selectively reactivated in subsequent network layers to improve model performance. To validate the effectiveness of Vision Transformer in Distorted Image (ViT-DI), we extend the ADE20K dataset and conduct experiments on the constructed dataset and the Cityscapes dataset. Our method achieves an mIoU increase of 1.6 and an FPS increase of 4.4 on the ADE20K dataset, and an mIoU increase of 0.77 and an FPS increase of 2.9 on the Cityscapes dataset, with a reduction in model size of approximately 130 GFLOPs. The URL to our dataset is: https://github.com/GitHubYuxiao/ViT-DI . |
---|---|
ISSN: | 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3507272 |