Mixture Dense Regression for Object Detection and Human Pose Estimation

Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-04
Hauptverfasser:	Varamesh, Ali, Tuytelaars, Tinne
Format:	Artikel
Sprache:	eng
Schlagworte:	Collapse Computer vision Ground truth Machine learning Model accuracy Object recognition Probabilistic models Uncertainty
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Varamesh, Ali Tuytelaars, Tinne
description	Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not only ill-defined but also well-defined complex problems should benefit from them. To this end, we devise a framework for spatial regression using mixture density networks. We realize the framework for object detection and human pose estimation. For both tasks, a mixture model yields higher accuracy and divides the input space into interpretable modes. For object detection, mixture components focus on object scale, with the distribution of components closely following that of ground truth the object scale. This practically alleviates the need for multi-scale testing, providing a superior speed-accuracy trade-off. For human pose estimation, a mixture model divides the data based on viewpoint and uncertainty -- namely, front and back views, with back view imposing higher uncertainty. We conduct experiments on the MS COCO dataset and do not face any mode collapse.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2321072136</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2321072136</sourcerecordid><originalsourceid>FETCH-proquest_journals_23210721363</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgWLR_CHgupBvbetdqL6KI9xJ1Kyk20WwCPt8UfICngZmZsASkzLP1CmDGUqJeCAFlBUUhE7Y_6I8PDvkWDSE_48MhkbaGd9bx47XHm4_NR4xSmTtvwqAMP9m41-T1oMayYNNOPQnTH-dsuasvmyZ7OfsOSL7tbXAmphYk5KKCXJbyv-sL2Hw7PQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2321072136</pqid></control><display><type>article</type><title>Mixture Dense Regression for Object Detection and Human Pose Estimation</title><source>Free E- Journals</source><creator>Varamesh, Ali ; Tuytelaars, Tinne</creator><creatorcontrib>Varamesh, Ali ; Tuytelaars, Tinne</creatorcontrib><description>Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not only ill-defined but also well-defined complex problems should benefit from them. To this end, we devise a framework for spatial regression using mixture density networks. We realize the framework for object detection and human pose estimation. For both tasks, a mixture model yields higher accuracy and divides the input space into interpretable modes. For object detection, mixture components focus on object scale, with the distribution of components closely following that of ground truth the object scale. This practically alleviates the need for multi-scale testing, providing a superior speed-accuracy trade-off. For human pose estimation, a mixture model divides the data based on viewpoint and uncertainty -- namely, front and back views, with back view imposing higher uncertainty. We conduct experiments on the MS COCO dataset and do not face any mode collapse.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collapse ; Computer vision ; Ground truth ; Machine learning ; Model accuracy ; Object recognition ; Probabilistic models ; Uncertainty</subject><ispartof>arXiv.org, 2020-04</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Varamesh, Ali</creatorcontrib><creatorcontrib>Tuytelaars, Tinne</creatorcontrib><title>Mixture Dense Regression for Object Detection and Human Pose Estimation</title><title>arXiv.org</title><description>Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not only ill-defined but also well-defined complex problems should benefit from them. To this end, we devise a framework for spatial regression using mixture density networks. We realize the framework for object detection and human pose estimation. For both tasks, a mixture model yields higher accuracy and divides the input space into interpretable modes. For object detection, mixture components focus on object scale, with the distribution of components closely following that of ground truth the object scale. This practically alleviates the need for multi-scale testing, providing a superior speed-accuracy trade-off. For human pose estimation, a mixture model divides the data based on viewpoint and uncertainty -- namely, front and back views, with back view imposing higher uncertainty. We conduct experiments on the MS COCO dataset and do not face any mode collapse.</description><subject>Collapse</subject><subject>Computer vision</subject><subject>Ground truth</subject><subject>Machine learning</subject><subject>Model accuracy</subject><subject>Object recognition</subject><subject>Probabilistic models</subject><subject>Uncertainty</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNikEKwjAQAIMgWLR_CHgupBvbetdqL6KI9xJ1Kyk20WwCPt8UfICngZmZsASkzLP1CmDGUqJeCAFlBUUhE7Y_6I8PDvkWDSE_48MhkbaGd9bx47XHm4_NR4xSmTtvwqAMP9m41-T1oMayYNNOPQnTH-dsuasvmyZ7OfsOSL7tbXAmphYk5KKCXJbyv-sL2Hw7PQ</recordid><startdate>20200420</startdate><enddate>20200420</enddate><creator>Varamesh, Ali</creator><creator>Tuytelaars, Tinne</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200420</creationdate><title>Mixture Dense Regression for Object Detection and Human Pose Estimation</title><author>Varamesh, Ali ; Tuytelaars, Tinne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23210721363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Collapse</topic><topic>Computer vision</topic><topic>Ground truth</topic><topic>Machine learning</topic><topic>Model accuracy</topic><topic>Object recognition</topic><topic>Probabilistic models</topic><topic>Uncertainty</topic><toplevel>online_resources</toplevel><creatorcontrib>Varamesh, Ali</creatorcontrib><creatorcontrib>Tuytelaars, Tinne</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Varamesh, Ali</au><au>Tuytelaars, Tinne</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Mixture Dense Regression for Object Detection and Human Pose Estimation</atitle><jtitle>arXiv.org</jtitle><date>2020-04-20</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not only ill-defined but also well-defined complex problems should benefit from them. To this end, we devise a framework for spatial regression using mixture density networks. We realize the framework for object detection and human pose estimation. For both tasks, a mixture model yields higher accuracy and divides the input space into interpretable modes. For object detection, mixture components focus on object scale, with the distribution of components closely following that of ground truth the object scale. This practically alleviates the need for multi-scale testing, providing a superior speed-accuracy trade-off. For human pose estimation, a mixture model divides the data based on viewpoint and uncertainty -- namely, front and back views, with back view imposing higher uncertainty. We conduct experiments on the MS COCO dataset and do not face any mode collapse.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2020-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2321072136
source	Free E- Journals
subjects	Collapse Computer vision Ground truth Machine learning Model accuracy Object recognition Probabilistic models Uncertainty
title	Mixture Dense Regression for Object Detection and Human Pose Estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T19%3A38%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Mixture%20Dense%20Regression%20for%20Object%20Detection%20and%20Human%20Pose%20Estimation&rft.jtitle=arXiv.org&rft.au=Varamesh,%20Ali&rft.date=2020-04-20&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2321072136%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2321072136&rft_id=info:pmid/&rfr_iscdi=true