A Fast and Effective Transformer for Human Pose Estimation

Most of the existing human pose estimation methods improve accuracy by constantly increasing computational resources. However, balancing the efficiency and efficacy of the model is the key to enhancing the real application value. In this work, we present a Fast and Effective Transformer model to ens...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2022, Vol.29, p.992-996
Hauptverfasser: Wang, Dong, Xie, Wenjun, Cai, Youcheng, Liu, Xiaoping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most of the existing human pose estimation methods improve accuracy by constantly increasing computational resources. However, balancing the efficiency and efficacy of the model is the key to enhancing the real application value. In this work, we present a Fast and Effective Transformer model to ensure the efficiency and efficacy of the model, called FET. Specifically, the FET consists of three parts: Feature Extraction Module (FEM), Feature Interaction Module (FIM) and Feature Decode Module (FDM). The FEM is used to efficiently extract low-level features from input images. Unlike CNN-based strategies, the FIM enables our model to capture global dependencies by self-attention, thus improving the accuracy for human pose estimation. The FDM is a multistage way that gradually recovers the size of the features to obtain a higher-quality target heatmap. In addition, Feature Squeeze Attention is introduced in the FET to further improve the overall performance of our model. Extensive experiments show that our method is 1.7× and 7× faster than SimpleBaseline and HRNet-32, respectively, while achieving comparable or even better results with the most state-of-the-art methods on the COCO dataset and the MPII dataset.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2022.3163678