MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation

Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large num...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xie, Rongchang, Wang, Chunyu, Wang, Yizhou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xie, Rongchang
Wang, Chunyu
Wang, Yizhou
description Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.
doi_str_mv 10.48550/arxiv.2003.13239
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2003_13239</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2003_13239</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-4db8af438df725357ded6d6b16125a26acc7ce6e5c9960e74d644b85c460a7f13</originalsourceid><addsrcrecordid>eNotj81qwkAUhWfTRbF9gK6cF0ic3ztJF4KIPwVFF-7DzcwdCGhSJrG0b2_Urg6cA9_hY-xDitwU1ooZpt_mJ1dC6FxqpctXNt_TgOtrT598wY-JsiFh01LgY9d0Ld93gc48dolvrxds-bHria_6obngMO5v7CXiuaf3_5yw03p1Wm6z3WHztVzsMgRXZibUBUajixCdstq6QAEC1BKksqgAvXeegKwvSxDkTABj6sJ6AwJdlHrCpk_sQ6D6TuN9-qvuItVDRN8AxF9CIw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation</title><source>arXiv.org</source><creator>Xie, Rongchang ; Wang, Chunyu ; Wang, Yizhou</creator><creatorcontrib>Xie, Rongchang ; Wang, Chunyu ; Wang, Yizhou</creatorcontrib><description>Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.</description><identifier>DOI: 10.48550/arxiv.2003.13239</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2020-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2003.13239$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2003.13239$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xie, Rongchang</creatorcontrib><creatorcontrib>Wang, Chunyu</creatorcontrib><creatorcontrib>Wang, Yizhou</creatorcontrib><title>MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation</title><description>Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81qwkAUhWfTRbF9gK6cF0ic3ztJF4KIPwVFF-7DzcwdCGhSJrG0b2_Urg6cA9_hY-xDitwU1ooZpt_mJ1dC6FxqpctXNt_TgOtrT598wY-JsiFh01LgY9d0Ld93gc48dolvrxds-bHria_6obngMO5v7CXiuaf3_5yw03p1Wm6z3WHztVzsMgRXZibUBUajixCdstq6QAEC1BKksqgAvXeegKwvSxDkTABj6sJ6AwJdlHrCpk_sQ6D6TuN9-qvuItVDRN8AxF9CIw</recordid><startdate>20200330</startdate><enddate>20200330</enddate><creator>Xie, Rongchang</creator><creator>Wang, Chunyu</creator><creator>Wang, Yizhou</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200330</creationdate><title>MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation</title><author>Xie, Rongchang ; Wang, Chunyu ; Wang, Yizhou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-4db8af438df725357ded6d6b16125a26acc7ce6e5c9960e74d644b85c460a7f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Rongchang</creatorcontrib><creatorcontrib>Wang, Chunyu</creatorcontrib><creatorcontrib>Wang, Yizhou</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Rongchang</au><au>Wang, Chunyu</au><au>Wang, Yizhou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation</atitle><date>2020-03-30</date><risdate>2020</risdate><abstract>Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.</abstract><doi>10.48550/arxiv.2003.13239</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2003.13239
ispartof
issn
language eng
recordid cdi_arxiv_primary_2003_13239
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T03%3A12%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MetaFuse:%20A%20Pre-trained%20Fusion%20Model%20for%20Human%20Pose%20Estimation&rft.au=Xie,%20Rongchang&rft.date=2020-03-30&rft_id=info:doi/10.48550/arxiv.2003.13239&rft_dat=%3Carxiv_GOX%3E2003_13239%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true