Human Parsing With Pyramidical Gather-Excite Context

Human parsing, especially in the wild, has attracted a lot of attention due to its great potential in many real-world applications. The Pyramid Spatial Parsing (PSP) module has shown superior performances in scene and human parsing tasks. However, the basic AvgPool operation in PSP equally aggregate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2021-03, Vol.31 (3), p.1016-1030
Hauptverfasser: Zhang, Sanyi, Qi, Guo-Jun, Cao, Xiaochun, Song, Zhanjie, Zhou, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Human parsing, especially in the wild, has attracted a lot of attention due to its great potential in many real-world applications. The Pyramid Spatial Parsing (PSP) module has shown superior performances in scene and human parsing tasks. However, the basic AvgPool operation in PSP equally aggregates spatial clues of a local region, and thus mixes up influences of different human parts presented in this region. It results in failures in capturing useful contexts relevant to parsing different parts. To address this problem, a suitable mechanism to collect spatial clues aligning with different human parts is proposed in this paper. We employ a Gather-Excite (GE) operation, a replacement of the AvgPool-Upsample operation in a pyramidical structure, to accurately reflect relevant human parts of various scales. The GE operation contains two steps: the gather operation that adaptively aggregates spatial clues to relevant human parts, and the excite operation that generates new feature maps with the gathered contextual information. This results in a novel Pyramidical Gather-Excite Context (PGEC) module to solve the multi-scale problem and parse person at various scales. The PGEC module is composed of multiple GE operations with different spatial extents and aggregates local and global spatial clues for better modeling multi-scale contextual information in parallel. Moreover, we integrate the PGEC module with fine-grained details, edge preserving module and deep supervision to formulate a novel PGEC Network (PGECNet) for human parsing. The proposed PGECNet has achieved state-of-the-art performance on four single-person human parsing datasets ( i.e. , LIP, PPSS, ATR and Fashion Clothing) and two multi-person human parsing datasets ( i.e. , PASCAL-Person-Part and CIHP). The experimental results show that the proposed PGEC is superior to the PSP and ASPP modules especially in single-human parsing task. The source code is publicly available at https://github.com/31sy/PGECNet .
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2020.2990531