Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images

Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lahiri, Avisek, Reddy, Charan, Biswas, Prabir Kumar
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lahiri, Avisek Reddy, Charan Biswas, Prabir Kumar
description	Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.
doi_str_mv	10.48550/arxiv.1810.02074
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1810_02074</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1810_02074</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-376abac4efcffd7b5b887d284476b262c21e8b5176a138ea5a83f08850b4f7943</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXpoqT9gK6qH3AqW5KlLEPSR8CQTdKtubKvjIJtGckx7d9XTbs6MDMMHEKecrYWWkr2AuHLLetcp4AVTIl70p3HeJ0wLC5iS7ftgiFCcNDTTxevCRUu2NO9H8CNqYdphtn5kVofUgdhdGOXti16ejQXbGa6xznBh0ht8AM9DNBhfCB3FvqIj_9ckdPb62n3kVXH98NuW2VQKpFxVYKBRqBtrG2VkUZr1RZaCFWaoiyaIkdtZJ5mOdcIEjS3TGvJjLBqI_iKPP_d3kzrKbgBwnf9a1zfjPkPw2xSMA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images</title><source>arXiv.org</source><creator>Lahiri, Avisek ; Reddy, Charan ; Biswas, Prabir Kumar</creator><creatorcontrib>Lahiri, Avisek ; Reddy, Charan ; Biswas, Prabir Kumar</creatorcontrib><description>Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.</description><identifier>DOI: 10.48550/arxiv.1810.02074</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2018-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1810.02074$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1810.02074$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lahiri, Avisek</creatorcontrib><creatorcontrib>Reddy, Charan</creatorcontrib><creatorcontrib>Biswas, Prabir Kumar</creatorcontrib><title>Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images</title><description>Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAURLXpoqT9gK6qH3AqW5KlLEPSR8CQTdKtubKvjIJtGckx7d9XTbs6MDMMHEKecrYWWkr2AuHLLetcp4AVTIl70p3HeJ0wLC5iS7ftgiFCcNDTTxevCRUu2NO9H8CNqYdphtn5kVofUgdhdGOXti16ejQXbGa6xznBh0ht8AM9DNBhfCB3FvqIj_9ckdPb62n3kVXH98NuW2VQKpFxVYKBRqBtrG2VkUZr1RZaCFWaoiyaIkdtZJ5mOdcIEjS3TGvJjLBqI_iKPP_d3kzrKbgBwnf9a1zfjPkPw2xSMA</recordid><startdate>20181004</startdate><enddate>20181004</enddate><creator>Lahiri, Avisek</creator><creator>Reddy, Charan</creator><creator>Biswas, Prabir Kumar</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181004</creationdate><title>Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images</title><author>Lahiri, Avisek ; Reddy, Charan ; Biswas, Prabir Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-376abac4efcffd7b5b887d284476b262c21e8b5176a138ea5a83f08850b4f7943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lahiri, Avisek</creatorcontrib><creatorcontrib>Reddy, Charan</creatorcontrib><creatorcontrib>Biswas, Prabir Kumar</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lahiri, Avisek</au><au>Reddy, Charan</au><au>Biswas, Prabir Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images</atitle><date>2018-10-04</date><risdate>2018</risdate><abstract>Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.</abstract><doi>10.48550/arxiv.1810.02074</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1810.02074
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1810_02074
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T20%3A25%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20Adversarial%20Visual%20Level%20Domain%20Adaptation%20for%20Learning%20Video%20Object%20Detectors%20from%20Images&rft.au=Lahiri,%20Avisek&rft.date=2018-10-04&rft_id=info:doi/10.48550/arxiv.1810.02074&rft_dat=%3Carxiv_GOX%3E1810_02074%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true