Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification

Visible-thermal person re-identification (VTReID) is a rising and challenging cross-modality retrieval task in intelligent video surveillance systems. Most attention architectures cannot explore the discriminative person representations for VTReID, especially in the thermal modality. In addition, th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on biometrics, behavior, and identity science behavior, and identity science, 2023-01, Vol.5 (1), p.1-14
Hauptverfasser: Wu, Yong, He, Guo-Dui, Wen, Li-Hua, Qin, Xiao, Yuan, Chang-An, Gribova, Valeriya, Filaretov, Vladimir Fedorovich, Huang, De-Shuang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14
container_issue 1
container_start_page 1
container_title IEEE transactions on biometrics, behavior, and identity science
container_volume 5
creator Wu, Yong
He, Guo-Dui
Wen, Li-Hua
Qin, Xiao
Yuan, Chang-An
Gribova, Valeriya
Filaretov, Vladimir Fedorovich
Huang, De-Shuang
description Visible-thermal person re-identification (VTReID) is a rising and challenging cross-modality retrieval task in intelligent video surveillance systems. Most attention architectures cannot explore the discriminative person representations for VTReID, especially in the thermal modality. In addition, the fine-grained middle-level semantic information has received much less attention in the part-based approaches for the cross-modality pedestrian retrieval task, resulting in limited generalization capability and poor representation robustness. This paper proposes a simple yet powerful discriminative local representation learning (DLRL) model to capture the robust local fine-grained feature representations and explore the rich semantic relationship between the learned part features. Specifically, an efficient contextual attention aggregation module (CAAM) is designed to strengthen the discriminative capability of the feature representations and explore the contextual cues for visible and thermal modalities. Then, an integrated middle-high feature learning (IMHF) method is introduced to capture the part-level salient representations, which handles the ambiguous modality discrepancy in both discriminative middle-level and robust high-level information. Moreover, a part-guided graph convolution module (PGCM) is constructed to mine the structural relationship among the part representations within each modality. The quantitative and qualitative experiments on the two benchmark datasets demonstrate that the proposed DLRL model significantly outperforms state-of-the-art methods and achieves rank-1/mAP accuracy of 92.77%/82.05% on the RegDB dataset and 63.04%/60.58% on the SYSU-MM01 dataset.
doi_str_mv 10.1109/TBIOM.2022.3184525
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TBIOM_2022_3184525</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9803280</ieee_id><sourcerecordid>2757177435</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-b0417eb4d6c82184110dbb67910b0627187a08ede1045c8c9d791dd6afceaef33</originalsourceid><addsrcrecordid>eNpNkF1LwzAUhoMoOHR_QG8KXneepGnTXur8GnRMxvQ2pMmpy-jamXTC_r3ZByLkkEPOed7wvoTcUBhRCsX94nEym44YMDZKaM5Tlp6RAcsSEWccxPm__pIMvV8BAANehDMgyyfrtbNr26re_mBUdlo10Rw3Dj22fXjs2qhE5VrbfkV156Kx67yPp51Rje130af1tmowXizRrQP6js4HZI7xxAQBW1t9ELkmF7VqPA5P9xX5eHlejN_icvY6GT-UsWZF2scVcCqw4ibTOQtmgkFTVZkoKFSQMUFzoSBHgxR4qnNdmDAyJlO1RoV1klyRu6PuxnXfW_S9XHVb14YvJROpoELwJA1b7Lil924c1nITQlBuJynIfajyEKrchypPoQbo9ghZRPwDihwSFuoXyI90OQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2757177435</pqid></control><display><type>article</type><title>Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification</title><source>IEEE Electronic Library (IEL)</source><creator>Wu, Yong ; He, Guo-Dui ; Wen, Li-Hua ; Qin, Xiao ; Yuan, Chang-An ; Gribova, Valeriya ; Filaretov, Vladimir Fedorovich ; Huang, De-Shuang</creator><creatorcontrib>Wu, Yong ; He, Guo-Dui ; Wen, Li-Hua ; Qin, Xiao ; Yuan, Chang-An ; Gribova, Valeriya ; Filaretov, Vladimir Fedorovich ; Huang, De-Shuang</creatorcontrib><description>Visible-thermal person re-identification (VTReID) is a rising and challenging cross-modality retrieval task in intelligent video surveillance systems. Most attention architectures cannot explore the discriminative person representations for VTReID, especially in the thermal modality. In addition, the fine-grained middle-level semantic information has received much less attention in the part-based approaches for the cross-modality pedestrian retrieval task, resulting in limited generalization capability and poor representation robustness. This paper proposes a simple yet powerful discriminative local representation learning (DLRL) model to capture the robust local fine-grained feature representations and explore the rich semantic relationship between the learned part features. Specifically, an efficient contextual attention aggregation module (CAAM) is designed to strengthen the discriminative capability of the feature representations and explore the contextual cues for visible and thermal modalities. Then, an integrated middle-high feature learning (IMHF) method is introduced to capture the part-level salient representations, which handles the ambiguous modality discrepancy in both discriminative middle-level and robust high-level information. Moreover, a part-guided graph convolution module (PGCM) is constructed to mine the structural relationship among the part representations within each modality. The quantitative and qualitative experiments on the two benchmark datasets demonstrate that the proposed DLRL model significantly outperforms state-of-the-art methods and achieves rank-1/mAP accuracy of 92.77%/82.05% on the RegDB dataset and 63.04%/60.58% on the SYSU-MM01 dataset.</description><identifier>ISSN: 2637-6407</identifier><identifier>EISSN: 2637-6407</identifier><identifier>DOI: 10.1109/TBIOM.2022.3184525</identifier><identifier>CODEN: ITBBCT</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>attention mechanism ; Cameras ; Convolution ; cross-modality ; Datasets ; Feature extraction ; graph convolution ; middle-level features ; Modules ; Person re-identification ; Representation learning ; Representations ; Retrieval ; Robustness ; Semantics ; Surveillance systems ; Task analysis ; Training</subject><ispartof>IEEE transactions on biometrics, behavior, and identity science, 2023-01, Vol.5 (1), p.1-14</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-b0417eb4d6c82184110dbb67910b0627187a08ede1045c8c9d791dd6afceaef33</citedby><cites>FETCH-LOGICAL-c295t-b0417eb4d6c82184110dbb67910b0627187a08ede1045c8c9d791dd6afceaef33</cites><orcidid>0000-0001-8093-971X ; 0000-0002-6759-2691 ; 0000-0001-9393-351X ; 0000-0001-5628-1858</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9803280$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9803280$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Yong</creatorcontrib><creatorcontrib>He, Guo-Dui</creatorcontrib><creatorcontrib>Wen, Li-Hua</creatorcontrib><creatorcontrib>Qin, Xiao</creatorcontrib><creatorcontrib>Yuan, Chang-An</creatorcontrib><creatorcontrib>Gribova, Valeriya</creatorcontrib><creatorcontrib>Filaretov, Vladimir Fedorovich</creatorcontrib><creatorcontrib>Huang, De-Shuang</creatorcontrib><title>Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification</title><title>IEEE transactions on biometrics, behavior, and identity science</title><addtitle>TBIOM</addtitle><description>Visible-thermal person re-identification (VTReID) is a rising and challenging cross-modality retrieval task in intelligent video surveillance systems. Most attention architectures cannot explore the discriminative person representations for VTReID, especially in the thermal modality. In addition, the fine-grained middle-level semantic information has received much less attention in the part-based approaches for the cross-modality pedestrian retrieval task, resulting in limited generalization capability and poor representation robustness. This paper proposes a simple yet powerful discriminative local representation learning (DLRL) model to capture the robust local fine-grained feature representations and explore the rich semantic relationship between the learned part features. Specifically, an efficient contextual attention aggregation module (CAAM) is designed to strengthen the discriminative capability of the feature representations and explore the contextual cues for visible and thermal modalities. Then, an integrated middle-high feature learning (IMHF) method is introduced to capture the part-level salient representations, which handles the ambiguous modality discrepancy in both discriminative middle-level and robust high-level information. Moreover, a part-guided graph convolution module (PGCM) is constructed to mine the structural relationship among the part representations within each modality. The quantitative and qualitative experiments on the two benchmark datasets demonstrate that the proposed DLRL model significantly outperforms state-of-the-art methods and achieves rank-1/mAP accuracy of 92.77%/82.05% on the RegDB dataset and 63.04%/60.58% on the SYSU-MM01 dataset.</description><subject>attention mechanism</subject><subject>Cameras</subject><subject>Convolution</subject><subject>cross-modality</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>graph convolution</subject><subject>middle-level features</subject><subject>Modules</subject><subject>Person re-identification</subject><subject>Representation learning</subject><subject>Representations</subject><subject>Retrieval</subject><subject>Robustness</subject><subject>Semantics</subject><subject>Surveillance systems</subject><subject>Task analysis</subject><subject>Training</subject><issn>2637-6407</issn><issn>2637-6407</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkF1LwzAUhoMoOHR_QG8KXneepGnTXur8GnRMxvQ2pMmpy-jamXTC_r3ZByLkkEPOed7wvoTcUBhRCsX94nEym44YMDZKaM5Tlp6RAcsSEWccxPm__pIMvV8BAANehDMgyyfrtbNr26re_mBUdlo10Rw3Dj22fXjs2qhE5VrbfkV156Kx67yPp51Rje130af1tmowXizRrQP6js4HZI7xxAQBW1t9ELkmF7VqPA5P9xX5eHlejN_icvY6GT-UsWZF2scVcCqw4ibTOQtmgkFTVZkoKFSQMUFzoSBHgxR4qnNdmDAyJlO1RoV1klyRu6PuxnXfW_S9XHVb14YvJROpoELwJA1b7Lil924c1nITQlBuJynIfajyEKrchypPoQbo9ghZRPwDihwSFuoXyI90OQ</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Wu, Yong</creator><creator>He, Guo-Dui</creator><creator>Wen, Li-Hua</creator><creator>Qin, Xiao</creator><creator>Yuan, Chang-An</creator><creator>Gribova, Valeriya</creator><creator>Filaretov, Vladimir Fedorovich</creator><creator>Huang, De-Shuang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-8093-971X</orcidid><orcidid>https://orcid.org/0000-0002-6759-2691</orcidid><orcidid>https://orcid.org/0000-0001-9393-351X</orcidid><orcidid>https://orcid.org/0000-0001-5628-1858</orcidid></search><sort><creationdate>202301</creationdate><title>Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification</title><author>Wu, Yong ; He, Guo-Dui ; Wen, Li-Hua ; Qin, Xiao ; Yuan, Chang-An ; Gribova, Valeriya ; Filaretov, Vladimir Fedorovich ; Huang, De-Shuang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-b0417eb4d6c82184110dbb67910b0627187a08ede1045c8c9d791dd6afceaef33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>attention mechanism</topic><topic>Cameras</topic><topic>Convolution</topic><topic>cross-modality</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>graph convolution</topic><topic>middle-level features</topic><topic>Modules</topic><topic>Person re-identification</topic><topic>Representation learning</topic><topic>Representations</topic><topic>Retrieval</topic><topic>Robustness</topic><topic>Semantics</topic><topic>Surveillance systems</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Yong</creatorcontrib><creatorcontrib>He, Guo-Dui</creatorcontrib><creatorcontrib>Wen, Li-Hua</creatorcontrib><creatorcontrib>Qin, Xiao</creatorcontrib><creatorcontrib>Yuan, Chang-An</creatorcontrib><creatorcontrib>Gribova, Valeriya</creatorcontrib><creatorcontrib>Filaretov, Vladimir Fedorovich</creatorcontrib><creatorcontrib>Huang, De-Shuang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on biometrics, behavior, and identity science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Yong</au><au>He, Guo-Dui</au><au>Wen, Li-Hua</au><au>Qin, Xiao</au><au>Yuan, Chang-An</au><au>Gribova, Valeriya</au><au>Filaretov, Vladimir Fedorovich</au><au>Huang, De-Shuang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification</atitle><jtitle>IEEE transactions on biometrics, behavior, and identity science</jtitle><stitle>TBIOM</stitle><date>2023-01</date><risdate>2023</risdate><volume>5</volume><issue>1</issue><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>2637-6407</issn><eissn>2637-6407</eissn><coden>ITBBCT</coden><abstract>Visible-thermal person re-identification (VTReID) is a rising and challenging cross-modality retrieval task in intelligent video surveillance systems. Most attention architectures cannot explore the discriminative person representations for VTReID, especially in the thermal modality. In addition, the fine-grained middle-level semantic information has received much less attention in the part-based approaches for the cross-modality pedestrian retrieval task, resulting in limited generalization capability and poor representation robustness. This paper proposes a simple yet powerful discriminative local representation learning (DLRL) model to capture the robust local fine-grained feature representations and explore the rich semantic relationship between the learned part features. Specifically, an efficient contextual attention aggregation module (CAAM) is designed to strengthen the discriminative capability of the feature representations and explore the contextual cues for visible and thermal modalities. Then, an integrated middle-high feature learning (IMHF) method is introduced to capture the part-level salient representations, which handles the ambiguous modality discrepancy in both discriminative middle-level and robust high-level information. Moreover, a part-guided graph convolution module (PGCM) is constructed to mine the structural relationship among the part representations within each modality. The quantitative and qualitative experiments on the two benchmark datasets demonstrate that the proposed DLRL model significantly outperforms state-of-the-art methods and achieves rank-1/mAP accuracy of 92.77%/82.05% on the RegDB dataset and 63.04%/60.58% on the SYSU-MM01 dataset.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TBIOM.2022.3184525</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-8093-971X</orcidid><orcidid>https://orcid.org/0000-0002-6759-2691</orcidid><orcidid>https://orcid.org/0000-0001-9393-351X</orcidid><orcidid>https://orcid.org/0000-0001-5628-1858</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2637-6407
ispartof IEEE transactions on biometrics, behavior, and identity science, 2023-01, Vol.5 (1), p.1-14
issn 2637-6407
2637-6407
language eng
recordid cdi_crossref_primary_10_1109_TBIOM_2022_3184525
source IEEE Electronic Library (IEL)
subjects attention mechanism
Cameras
Convolution
cross-modality
Datasets
Feature extraction
graph convolution
middle-level features
Modules
Person re-identification
Representation learning
Representations
Retrieval
Robustness
Semantics
Surveillance systems
Task analysis
Training
title Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T11%3A05%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discriminative%20Local%20Representation%20Learning%20for%20Cross-Modality%20Visible-Thermal%20Person%20Re-Identification&rft.jtitle=IEEE%20transactions%20on%20biometrics,%20behavior,%20and%20identity%20science&rft.au=Wu,%20Yong&rft.date=2023-01&rft.volume=5&rft.issue=1&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=2637-6407&rft.eissn=2637-6407&rft.coden=ITBBCT&rft_id=info:doi/10.1109/TBIOM.2022.3184525&rft_dat=%3Cproquest_RIE%3E2757177435%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2757177435&rft_id=info:pmid/&rft_ieee_id=9803280&rfr_iscdi=true