Motion Feature Aggregation for Video-Based Person Re-Identification

Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modelin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2022, Vol.31, p.3908-3919
Hauptverfasser: Gu, Xinqian, Chang, Hong, Ma, Bingpeng, Shan, Shiguang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3919
container_issue
container_start_page 3908
container_title IEEE transactions on image processing
container_volume 31
creator Gu, Xinqian
Chang, Hong
Ma, Bingpeng
Shan, Shiguang
description Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .
doi_str_mv 10.1109/TIP.2022.3175593
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35622788</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9784400</ieee_id><sourcerecordid>2671265595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2675048988</pqid></control><display><type>article</type><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><source>IEEE Electronic Library (IEL)</source><creator>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creator><creatorcontrib>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creatorcontrib><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3175593</identifier><identifier>PMID: 35622788</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Agglomeration ; Body parts ; Computational modeling ; Computer architecture ; Data mining ; Datasets ; Feature extraction ; Feature maps ; Learning ; Modules ; motion feature extraction ; Motion perception ; Optical imaging ; Spatiotemporal phenomena ; temporal information modeling ; Tracking ; Training ; Video-based person re-identification</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.3908-3919</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</citedby><cites>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</cites><orcidid>0000-0002-8348-392X ; 0000-0001-8984-205X ; 0000-0002-2668-0070 ; 0000-0003-1234-8795</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35622788$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><subject>Agglomeration</subject><subject>Body parts</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Learning</subject><subject>Modules</subject><subject>motion feature extraction</subject><subject>Motion perception</subject><subject>Optical imaging</subject><subject>Spatiotemporal phenomena</subject><subject>temporal information modeling</subject><subject>Tracking</subject><subject>Training</subject><subject>Video-based person re-identification</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Gu, Xinqian</creator><creator>Chang, Hong</creator><creator>Ma, Bingpeng</creator><creator>Shan, Shiguang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></search><sort><creationdate>2022</creationdate><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><author>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agglomeration</topic><topic>Body parts</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Learning</topic><topic>Modules</topic><topic>motion feature extraction</topic><topic>Motion perception</topic><topic>Optical imaging</topic><topic>Spatiotemporal phenomena</topic><topic>temporal information modeling</topic><topic>Tracking</topic><topic>Training</topic><topic>Video-based person re-identification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Xinqian</au><au>Chang, Hong</au><au>Ma, Bingpeng</au><au>Shan, Shiguang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Motion Feature Aggregation for Video-Based Person Re-Identification</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>3908</spage><epage>3919</epage><pages>3908-3919</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35622788</pmid><doi>10.1109/TIP.2022.3175593</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2022, Vol.31, p.3908-3919
issn 1057-7149
1941-0042
language eng
recordid cdi_pubmed_primary_35622788
source IEEE Electronic Library (IEL)
subjects Agglomeration
Body parts
Computational modeling
Computer architecture
Data mining
Datasets
Feature extraction
Feature maps
Learning
Modules
motion feature extraction
Motion perception
Optical imaging
Spatiotemporal phenomena
temporal information modeling
Tracking
Training
Video-based person re-identification
title Motion Feature Aggregation for Video-Based Person Re-Identification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T04%3A44%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Motion%20Feature%20Aggregation%20for%20Video-Based%20Person%20Re-Identification&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Gu,%20Xinqian&rft.date=2022&rft.volume=31&rft.spage=3908&rft.epage=3919&rft.pages=3908-3919&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3175593&rft_dat=%3Cproquest_RIE%3E2671265595%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2675048988&rft_id=info:pmid/35622788&rft_ieee_id=9784400&rfr_iscdi=true