Crowd Counting via Unsupervised Cross-Domain Feature Adaptation
Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting cond...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2023, Vol.25, p.4665-4678 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4678 |
---|---|
container_issue | |
container_start_page | 4665 |
container_title | IEEE transactions on multimedia |
container_volume | 25 |
creator | Ding, Guanchen Yang, Daiqin Wang, Tao Wang, Sihan Zhang, Yunfei |
description | Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods. |
doi_str_mv | 10.1109/TMM.2022.3180222 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2022_3180222</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9788041</ieee_id><sourcerecordid>2884893469</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-35103d8683bb7154e546e407f6d58b248923419686f23b523ed9f8061a34567a3</originalsourceid><addsrcrecordid>eNo9kM1Lw0AQxRdRsH7cBS8Bz6kz-70nKdGq0OKlPS-bZiMpNom7ScX_3i0tnt7AvDfz-BFyhzBFBPO4Wi6nFCidMtRJ6BmZoOGYAyh1nmZBITcU4ZJcxbgFQC5ATchTEbqfKiu6sR2a9jPbNy5bt3Hsfdg30adN6GLMn7uda9ps7t0wBp_NKtcPbmi69oZc1O4r-tuTXpP1_GVVvOWLj9f3YrbIN9TgkDOBwCotNStLhYJ7waXnoGpZCV1Srg1lHI3UsqasFJT5ytQaJDrGhVSOXZOH490-dN-jj4PddmNo00tLtU55xqVJLji6NofWwde2D83OhV-LYA-YbMJkD5jsCVOK3B8jjff-326U1sCR_QE462D3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2884893469</pqid></control><display><type>article</type><title>Crowd Counting via Unsupervised Cross-Domain Feature Adaptation</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Ding, Guanchen ; Yang, Daiqin ; Wang, Tao ; Wang, Sihan ; Zhang, Yunfei</creator><creatorcontrib>Ding, Guanchen ; Yang, Daiqin ; Wang, Tao ; Wang, Sihan ; Zhang, Yunfei</creatorcontrib><description>Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3180222</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation ; Adaptation models ; adversarial learning ; Algorithms ; Annotations ; Bridges ; Crowd monitoring ; Data models ; Datasets ; density map estimation ; Discriminators ; domain adaptation ; Feature extraction ; Modules ; Semantics ; Surveillance systems ; Training ; Unsupervised crowd counting ; Unsupervised learning</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.4665-4678</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-35103d8683bb7154e546e407f6d58b248923419686f23b523ed9f8061a34567a3</citedby><cites>FETCH-LOGICAL-c291t-35103d8683bb7154e546e407f6d58b248923419686f23b523ed9f8061a34567a3</cites><orcidid>0000-0002-3983-6142 ; 0000-0001-8160-7995 ; 0000-0002-9523-1850</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9788041$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9788041$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ding, Guanchen</creatorcontrib><creatorcontrib>Yang, Daiqin</creatorcontrib><creatorcontrib>Wang, Tao</creatorcontrib><creatorcontrib>Wang, Sihan</creatorcontrib><creatorcontrib>Zhang, Yunfei</creatorcontrib><title>Crowd Counting via Unsupervised Cross-Domain Feature Adaptation</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods.</description><subject>Adaptation</subject><subject>Adaptation models</subject><subject>adversarial learning</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Bridges</subject><subject>Crowd monitoring</subject><subject>Data models</subject><subject>Datasets</subject><subject>density map estimation</subject><subject>Discriminators</subject><subject>domain adaptation</subject><subject>Feature extraction</subject><subject>Modules</subject><subject>Semantics</subject><subject>Surveillance systems</subject><subject>Training</subject><subject>Unsupervised crowd counting</subject><subject>Unsupervised learning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1Lw0AQxRdRsH7cBS8Bz6kz-70nKdGq0OKlPS-bZiMpNom7ScX_3i0tnt7AvDfz-BFyhzBFBPO4Wi6nFCidMtRJ6BmZoOGYAyh1nmZBITcU4ZJcxbgFQC5ATchTEbqfKiu6sR2a9jPbNy5bt3Hsfdg30adN6GLMn7uda9ps7t0wBp_NKtcPbmi69oZc1O4r-tuTXpP1_GVVvOWLj9f3YrbIN9TgkDOBwCotNStLhYJ7waXnoGpZCV1Srg1lHI3UsqasFJT5ytQaJDrGhVSOXZOH490-dN-jj4PddmNo00tLtU55xqVJLji6NofWwde2D83OhV-LYA-YbMJkD5jsCVOK3B8jjff-326U1sCR_QE462D3</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Ding, Guanchen</creator><creator>Yang, Daiqin</creator><creator>Wang, Tao</creator><creator>Wang, Sihan</creator><creator>Zhang, Yunfei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3983-6142</orcidid><orcidid>https://orcid.org/0000-0001-8160-7995</orcidid><orcidid>https://orcid.org/0000-0002-9523-1850</orcidid></search><sort><creationdate>2023</creationdate><title>Crowd Counting via Unsupervised Cross-Domain Feature Adaptation</title><author>Ding, Guanchen ; Yang, Daiqin ; Wang, Tao ; Wang, Sihan ; Zhang, Yunfei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-35103d8683bb7154e546e407f6d58b248923419686f23b523ed9f8061a34567a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation</topic><topic>Adaptation models</topic><topic>adversarial learning</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Bridges</topic><topic>Crowd monitoring</topic><topic>Data models</topic><topic>Datasets</topic><topic>density map estimation</topic><topic>Discriminators</topic><topic>domain adaptation</topic><topic>Feature extraction</topic><topic>Modules</topic><topic>Semantics</topic><topic>Surveillance systems</topic><topic>Training</topic><topic>Unsupervised crowd counting</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ding, Guanchen</creatorcontrib><creatorcontrib>Yang, Daiqin</creatorcontrib><creatorcontrib>Wang, Tao</creatorcontrib><creatorcontrib>Wang, Sihan</creatorcontrib><creatorcontrib>Zhang, Yunfei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ding, Guanchen</au><au>Yang, Daiqin</au><au>Wang, Tao</au><au>Wang, Sihan</au><au>Zhang, Yunfei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Crowd Counting via Unsupervised Cross-Domain Feature Adaptation</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>4665</spage><epage>4678</epage><pages>4665-4678</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3180222</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-3983-6142</orcidid><orcidid>https://orcid.org/0000-0001-8160-7995</orcidid><orcidid>https://orcid.org/0000-0002-9523-1850</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2023, Vol.25, p.4665-4678 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TMM_2022_3180222 |
source | IEEE/IET Electronic Library (IEL) |
subjects | Adaptation Adaptation models adversarial learning Algorithms Annotations Bridges Crowd monitoring Data models Datasets density map estimation Discriminators domain adaptation Feature extraction Modules Semantics Surveillance systems Training Unsupervised crowd counting Unsupervised learning |
title | Crowd Counting via Unsupervised Cross-Domain Feature Adaptation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T08%3A28%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Crowd%20Counting%20via%20Unsupervised%20Cross-Domain%20Feature%20Adaptation&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Ding,%20Guanchen&rft.date=2023&rft.volume=25&rft.spage=4665&rft.epage=4678&rft.pages=4665-4678&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3180222&rft_dat=%3Cproquest_RIE%3E2884893469%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2884893469&rft_id=info:pmid/&rft_ieee_id=9788041&rfr_iscdi=true |