Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation

The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Kun, Wu, Rui, Yao, Ping, Deng, Kai, Li, Ding, Liu, Renbiao, Yang, Chuanguang, Chen, Ge, Du, Min, Zheng, Tianyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Kun Wu, Rui Yao, Ping Deng, Kai Li, Ding Liu, Renbiao Yang, Chuanguang Chen, Ge Du, Min Zheng, Tianyao
description	The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.
doi_str_mv	10.48550/arxiv.2012.07101
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_07101</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_07101</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-a6148f00904b37d26e01afc941bfbca8a74c998127beae04f0bdb92ae0fc465f3</originalsourceid><addsrcrecordid>eNotj8tOwzAQRb1hgQofwAr_QMLYcV5LVEoDikQkukSKxoldWUriynYL7dcTUjYz5y7ujA4hDwxiUaQpPKH7MaeYA-Mx5AzYLfmqFbrJTHtaKQwjHqLPcB4UfTd7j9-0OV4ug_K0cfZk-hm21vZzUsGhWWraOspfaHUccaKN9YpufDAjBmOnO3KjcfDq_n-vyO51s1tXUf2xfVs_1xFmOZsHE4UGKEHIJO95poCh7krBpJYdFpiLriwLxnOpUIHQIHtZ8hl1J7JUJyvyeD276LUHN7935_ZPs100k18-6E4g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation</title><source>arXiv.org</source><creator>Zhang, Kun ; Wu, Rui ; Yao, Ping ; Deng, Kai ; Li, Ding ; Liu, Renbiao ; Yang, Chuanguang ; Chen, Ge ; Du, Min ; Zheng, Tianyao</creator><creatorcontrib>Zhang, Kun ; Wu, Rui ; Yao, Ping ; Deng, Kai ; Li, Ding ; Liu, Renbiao ; Yang, Chuanguang ; Chen, Ge ; Du, Min ; Zheng, Tianyao</creatorcontrib><description>The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.</description><identifier>DOI: 10.48550/arxiv.2012.07101</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2020-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.07101$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.07101$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Kun</creatorcontrib><creatorcontrib>Wu, Rui</creatorcontrib><creatorcontrib>Yao, Ping</creatorcontrib><creatorcontrib>Deng, Kai</creatorcontrib><creatorcontrib>Li, Ding</creatorcontrib><creatorcontrib>Liu, Renbiao</creatorcontrib><creatorcontrib>Yang, Chuanguang</creatorcontrib><creatorcontrib>Chen, Ge</creatorcontrib><creatorcontrib>Du, Min</creatorcontrib><creatorcontrib>Zheng, Tianyao</creatorcontrib><title>Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation</title><description>The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQRb1hgQofwAr_QMLYcV5LVEoDikQkukSKxoldWUriynYL7dcTUjYz5y7ujA4hDwxiUaQpPKH7MaeYA-Mx5AzYLfmqFbrJTHtaKQwjHqLPcB4UfTd7j9-0OV4ug_K0cfZk-hm21vZzUsGhWWraOspfaHUccaKN9YpufDAjBmOnO3KjcfDq_n-vyO51s1tXUf2xfVs_1xFmOZsHE4UGKEHIJO95poCh7krBpJYdFpiLriwLxnOpUIHQIHtZ8hl1J7JUJyvyeD276LUHN7935_ZPs100k18-6E4g</recordid><startdate>20201213</startdate><enddate>20201213</enddate><creator>Zhang, Kun</creator><creator>Wu, Rui</creator><creator>Yao, Ping</creator><creator>Deng, Kai</creator><creator>Li, Ding</creator><creator>Liu, Renbiao</creator><creator>Yang, Chuanguang</creator><creator>Chen, Ge</creator><creator>Du, Min</creator><creator>Zheng, Tianyao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201213</creationdate><title>Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation</title><author>Zhang, Kun ; Wu, Rui ; Yao, Ping ; Deng, Kai ; Li, Ding ; Liu, Renbiao ; Yang, Chuanguang ; Chen, Ge ; Du, Min ; Zheng, Tianyao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-a6148f00904b37d26e01afc941bfbca8a74c998127beae04f0bdb92ae0fc465f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Kun</creatorcontrib><creatorcontrib>Wu, Rui</creatorcontrib><creatorcontrib>Yao, Ping</creatorcontrib><creatorcontrib>Deng, Kai</creatorcontrib><creatorcontrib>Li, Ding</creatorcontrib><creatorcontrib>Liu, Renbiao</creatorcontrib><creatorcontrib>Yang, Chuanguang</creatorcontrib><creatorcontrib>Chen, Ge</creatorcontrib><creatorcontrib>Du, Min</creatorcontrib><creatorcontrib>Zheng, Tianyao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Kun</au><au>Wu, Rui</au><au>Yao, Ping</au><au>Deng, Kai</au><au>Li, Ding</au><au>Liu, Renbiao</au><au>Yang, Chuanguang</au><au>Chen, Ge</au><au>Du, Min</au><au>Zheng, Tianyao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation</atitle><date>2020-12-13</date><risdate>2020</risdate><abstract>The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones.</abstract><doi>10.48550/arxiv.2012.07101</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2012.07101
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2012_07101
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T06%3A14%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Heatmap-Style%20Jigsaw%20Puzzles%20Provides%20Good%20Pretraining%20for%202D%20Human%20Pose%20Estimation&rft.au=Zhang,%20Kun&rft.date=2020-12-13&rft_id=info:doi/10.48550/arxiv.2012.07101&rft_dat=%3Carxiv_GOX%3E2012_07101%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true