ReSParser: Fully Convolutional Multiple Human Parsing with Representative Sets
Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP as an ROI-based (i.e., detect-then-segment) or grouping-based (i.e., segment-the...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2024-01, Vol.26, p.1-12 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP as an ROI-based (i.e., detect-then-segment) or grouping-based (i.e., segment-then-grouping) paradigm. However, the strong dependence between the two sub-tasks limits the potential of an MHP method, since it often requires qualified prior predictions. Besides, isolated models responsible for the two sub-tasks bring a significant computational burden. Unlike existing methods, we regard MHP as a hierarchical set prediction problem and handle two sub-tasks using several landmarks of body parts. Motivated by this, we propose a novel multiple human parser with representative sets, termed ReSParser. In ReSParser, several landmarks of body parts are hierarchically estimated, resulting in coarse-to-fine representative sets. After that, each representative set is adaptively responsible for segmenting pixels into semantically consistent regions belonging to the corresponding person. In such a manner, the ReSParser simultaneously addresses two sub-tasks in a fully convolutional fashion, thus eliminating the dependence between two sub-tasks and significantly alleviating computational complexity. Extensive experiments on two challenging benchmarks demonstrate that our proposed ReSParser is an efficient framework with a superior parsing performance, which significantly outperforms that of other ROI-free yet grouping-free methods. Besides, it achieves competitive results to that of the best two-stage methods such as RP-RCNN, but requires a much lower inference time, showing a good precision-speed trade-off. Code and models are publicly available https://github.com/JosonChan1998/RepParser . We hope the ReSParser serves as a new baseline for multiple human parsing research in the future. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2023.3281070 |