ReSParser: Fully Convolutional Multiple Human Parsing with Representative Sets

Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP as an ROI-based (i.e., detect-then-segment) or grouping-based (i.e., segment-the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024-01, Vol.26, p.1-12
Hauptverfasser:	Dai, Yan, Chen, Xiaojia, Wang, Xuanhan, Pang, Minghui, Gao, Lianli, Shen, Heng Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Body parts Crops Detectors Human Instance-level Analysis Instance-aware modeling Kernel Multiple Human Parsing Pipelines Pose estimation Segments Semantics Task analysis Task complexity
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP as an ROI-based (i.e., detect-then-segment) or grouping-based (i.e., segment-then-grouping) paradigm. However, the strong dependence between the two sub-tasks limits the potential of an MHP method, since it often requires qualified prior predictions. Besides, isolated models responsible for the two sub-tasks bring a significant computational burden. Unlike existing methods, we regard MHP as a hierarchical set prediction problem and handle two sub-tasks using several landmarks of body parts. Motivated by this, we propose a novel multiple human parser with representative sets, termed ReSParser. In ReSParser, several landmarks of body parts are hierarchically estimated, resulting in coarse-to-fine representative sets. After that, each representative set is adaptively responsible for segmenting pixels into semantically consistent regions belonging to the corresponding person. In such a manner, the ReSParser simultaneously addresses two sub-tasks in a fully convolutional fashion, thus eliminating the dependence between two sub-tasks and significantly alleviating computational complexity. Extensive experiments on two challenging benchmarks demonstrate that our proposed ReSParser is an efficient framework with a superior parsing performance, which significantly outperforms that of other ROI-free yet grouping-free methods. Besides, it achieves competitive results to that of the best two-stage methods such as RP-RCNN, but requires a much lower inference time, showing a good precision-speed trade-off. Code and models are publicly available https://github.com/JosonChan1998/RepParser . We hope the ReSParser serves as a new baseline for multiple human parsing research in the future.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2023.3281070