Motion Feature Aggregation for Video-Based Person Re-Identification
Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modelin...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on image processing 2022, Vol.31, p.3908-3919 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3919 |
---|---|
container_issue | |
container_start_page | 3908 |
container_title | IEEE transactions on image processing |
container_volume | 31 |
creator | Gu, Xinqian Chang, Hong Ma, Bingpeng Shan, Shiguang |
description | Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID . |
doi_str_mv | 10.1109/TIP.2022.3175593 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35622788</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9784400</ieee_id><sourcerecordid>2671265595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2675048988</pqid></control><display><type>article</type><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><source>IEEE Electronic Library (IEL)</source><creator>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creator><creatorcontrib>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creatorcontrib><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3175593</identifier><identifier>PMID: 35622788</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Agglomeration ; Body parts ; Computational modeling ; Computer architecture ; Data mining ; Datasets ; Feature extraction ; Feature maps ; Learning ; Modules ; motion feature extraction ; Motion perception ; Optical imaging ; Spatiotemporal phenomena ; temporal information modeling ; Tracking ; Training ; Video-based person re-identification</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.3908-3919</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</citedby><cites>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</cites><orcidid>0000-0002-8348-392X ; 0000-0001-8984-205X ; 0000-0002-2668-0070 ; 0000-0003-1234-8795</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35622788$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><subject>Agglomeration</subject><subject>Body parts</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Learning</subject><subject>Modules</subject><subject>motion feature extraction</subject><subject>Motion perception</subject><subject>Optical imaging</subject><subject>Spatiotemporal phenomena</subject><subject>temporal information modeling</subject><subject>Tracking</subject><subject>Training</subject><subject>Video-based person re-identification</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Gu, Xinqian</creator><creator>Chang, Hong</creator><creator>Ma, Bingpeng</creator><creator>Shan, Shiguang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></search><sort><creationdate>2022</creationdate><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><author>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agglomeration</topic><topic>Body parts</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Learning</topic><topic>Modules</topic><topic>motion feature extraction</topic><topic>Motion perception</topic><topic>Optical imaging</topic><topic>Spatiotemporal phenomena</topic><topic>temporal information modeling</topic><topic>Tracking</topic><topic>Training</topic><topic>Video-based person re-identification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Xinqian</au><au>Chang, Hong</au><au>Ma, Bingpeng</au><au>Shan, Shiguang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Motion Feature Aggregation for Video-Based Person Re-Identification</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>3908</spage><epage>3919</epage><pages>3908-3919</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35622788</pmid><doi>10.1109/TIP.2022.3175593</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1057-7149 |
ispartof | IEEE transactions on image processing, 2022, Vol.31, p.3908-3919 |
issn | 1057-7149 1941-0042 |
language | eng |
recordid | cdi_pubmed_primary_35622788 |
source | IEEE Electronic Library (IEL) |
subjects | Agglomeration Body parts Computational modeling Computer architecture Data mining Datasets Feature extraction Feature maps Learning Modules motion feature extraction Motion perception Optical imaging Spatiotemporal phenomena temporal information modeling Tracking Training Video-based person re-identification |
title | Motion Feature Aggregation for Video-Based Person Re-Identification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T04%3A44%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Motion%20Feature%20Aggregation%20for%20Video-Based%20Person%20Re-Identification&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Gu,%20Xinqian&rft.date=2022&rft.volume=31&rft.spage=3908&rft.epage=3919&rft.pages=3908-3919&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3175593&rft_dat=%3Cproquest_RIE%3E2671265595%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2675048988&rft_id=info:pmid/35622788&rft_ieee_id=9784400&rfr_iscdi=true |