Motion Feature Aggregation for Video-Based Person Re-Identification

Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modelin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2022, Vol.31, p.3908-3919
Hauptverfasser:	Gu, Xinqian, Chang, Hong, Ma, Bingpeng, Shan, Shiguang
Format:	Artikel
Sprache:	eng
Schlagworte:	Agglomeration Body parts Computational modeling Computer architecture Data mining Datasets Feature extraction Feature maps Learning Modules motion feature extraction Motion perception Optical imaging Spatiotemporal phenomena temporal information modeling Tracking Training Video-based person re-identification
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3919
container_issue
container_start_page	3908
container_title	IEEE transactions on image processing
container_volume	31
creator	Gu, Xinqian Chang, Hong Ma, Bingpeng Shan, Shiguang
description	Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .
doi_str_mv	10.1109/TIP.2022.3175593
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35622788</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9784400</ieee_id><sourcerecordid>2671265595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2675048988</pqid></control><display><type>article</type><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><source>IEEE Electronic Library (IEL)</source><creator>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creator><creatorcontrib>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</creatorcontrib><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3175593</identifier><identifier>PMID: 35622788</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Agglomeration ; Body parts ; Computational modeling ; Computer architecture ; Data mining ; Datasets ; Feature extraction ; Feature maps ; Learning ; Modules ; motion feature extraction ; Motion perception ; Optical imaging ; Spatiotemporal phenomena ; temporal information modeling ; Tracking ; Training ; Video-based person re-identification</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.3908-3919</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</citedby><cites>FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</cites><orcidid>0000-0002-8348-392X ; 0000-0001-8984-205X ; 0000-0002-2668-0070 ; 0000-0003-1234-8795</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9784400$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35622788$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</description><subject>Agglomeration</subject><subject>Body parts</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Learning</subject><subject>Modules</subject><subject>motion feature extraction</subject><subject>Motion perception</subject><subject>Optical imaging</subject><subject>Spatiotemporal phenomena</subject><subject>temporal information modeling</subject><subject>Tracking</subject><subject>Training</subject><subject>Video-based person re-identification</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRrFbvgiAFL15SZza72eRYi9VCxSLVa9juTkpKm9Td5OC_d_thD55mmHlmeHkYu0HoI0L2OBtP-xw478eopMziE3aBmcAIQPDT0INUkUKRddil90sAFBKTc9aJZcK5StMLNnyrm7KueiPSTeuoN1gsHC30blbUrvdVWqqjJ-3J9qbkfBh_UDS2VDVlUZodeMXOCr3ydH2oXfY5ep4NX6PJ-8t4OJhEhivVRBaMzjRHIkRJJtYyFjRPlZ6HmJmZW8JCGG7AxkmibAqokQwXmIBAXti4yx72fzeu_m7JN_m69IZWK11R3fqcJwp5EjzIgN7_Q5d166qQbktJEGmWpoGCPWVc7b2jIt-4cq3dT46QbwXnQXC-FZwfBIeTu8Pjdr4mezz4MxqA2z1QEtFxnalUCID4F9j4fXc</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Gu, Xinqian</creator><creator>Chang, Hong</creator><creator>Ma, Bingpeng</creator><creator>Shan, Shiguang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></search><sort><creationdate>2022</creationdate><title>Motion Feature Aggregation for Video-Based Person Re-Identification</title><author>Gu, Xinqian ; Chang, Hong ; Ma, Bingpeng ; Shan, Shiguang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c277t-d0ca9a21ee115ec3a534eb87ab0579cbde1f4c2c0d3667d801a1ec24160412fd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agglomeration</topic><topic>Body parts</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Learning</topic><topic>Modules</topic><topic>motion feature extraction</topic><topic>Motion perception</topic><topic>Optical imaging</topic><topic>Spatiotemporal phenomena</topic><topic>temporal information modeling</topic><topic>Tracking</topic><topic>Training</topic><topic>Video-based person re-identification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gu, Xinqian</creatorcontrib><creatorcontrib>Chang, Hong</creatorcontrib><creatorcontrib>Ma, Bingpeng</creatorcontrib><creatorcontrib>Shan, Shiguang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Xinqian</au><au>Chang, Hong</au><au>Ma, Bingpeng</au><au>Shan, Shiguang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Motion Feature Aggregation for Video-Based Person Re-Identification</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>3908</spage><epage>3919</epage><pages>3908-3919</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Most video-based person re-identification (re-id) methods only focus on appearance features but neglect motion features. In fact, motion features can help to distinguish the target persons that are hard to be identified only by appearance features. However, most existing temporal information modeling methods cannot extract motion features effectively or efficiently for v ideo-based re-id. In this paper, we propose a more efficient Motion Feature Aggregation (MFA) method to model and aggregate motion information in the feature map level for video-based re-id. The proposed MFA consists of (i) a coarse-grained motion learning module, which extracts coarse-grained motion features based on the position changes of body parts over time, and (ii) a fine-grained motion learning module, which extracts fine-grained motion features based on the appearance changes of body parts over time. These two modules can model motion information from different granularities and are complementary to each other. It is easy to combine the proposed method with existing network architectures for end-to-end training. Extensive experiments on four widely used datasets demonstrate that the motion features extracted by MFA are crucial complements to appearance features for video-based re-id, especially for the scenario with large appearance changes. Besides, the results on LS-VID, the current largest publicly available video-based re-id dataset, surpass the state-of-the-art methods by a large margin. The code is available at: https://github.com/guxinqian/Simple-ReID .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35622788</pmid><doi>10.1109/TIP.2022.3175593</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8348-392X</orcidid><orcidid>https://orcid.org/0000-0001-8984-205X</orcidid><orcidid>https://orcid.org/0000-0002-2668-0070</orcidid><orcidid>https://orcid.org/0000-0003-1234-8795</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2022, Vol.31, p.3908-3919
issn	1057-7149 1941-0042
language	eng
recordid	cdi_pubmed_primary_35622788
source	IEEE Electronic Library (IEL)
subjects	Agglomeration Body parts Computational modeling Computer architecture Data mining Datasets Feature extraction Feature maps Learning Modules motion feature extraction Motion perception Optical imaging Spatiotemporal phenomena temporal information modeling Tracking Training Video-based person re-identification
title	Motion Feature Aggregation for Video-Based Person Re-Identification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T04%3A44%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Motion%20Feature%20Aggregation%20for%20Video-Based%20Person%20Re-Identification&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Gu,%20Xinqian&rft.date=2022&rft.volume=31&rft.spage=3908&rft.epage=3919&rft.pages=3908-3919&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3175593&rft_dat=%3Cproquest_RIE%3E2671265595%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2675048988&rft_id=info:pmid/35622788&rft_ieee_id=9784400&rfr_iscdi=true