Label Independent Memory for Semi-Supervised Few-Shot Video Classification

In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, wh...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2022-01, Vol.44 (1), p.273-285
Hauptverfasser:	Zhu, Linchao, Yang, Yi
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification compound memory networks Compounds Data models Dynamics Feature extraction Few-shot video classification memory-augmented neural networks Prototypes semi-supervised learning Task analysis Training Video data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	285
container_issue	1
container_start_page	273
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	44
creator	Zhu, Linchao Yang, Yi
description	In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification.
doi_str_mv	10.1109/TPAMI.2020.3007511
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9134951</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9134951</ieee_id><sourcerecordid>2607875709</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-8a8d8a53031fd05eda78ab558379dfdafa48b2d9a4cc8374d088ef463a73b0d03</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhoMotn78Ab0sePGydfK1mxyl-FFpUWj1GtLNLKZsNzXZKv57VysevMwLw_MOw0PIGYURpaCvFk_Xs8mIAYMRByglpXtkSDXXOZdc75Mh0ILlSjE1IEcprQCokMAPyYCzUoICMSQPU7vEJpu0DjfYj7bLZrgO8TOrQ8zmuPb5fLvB-O4TuuwWP_L5a-iyF-8wZOPGpuRrX9nOh_aEHNS2SXj6m8fk-fZmMb7Pp493k_H1NK84U12urHLKSg6c1g4kOlsqu5RS8VK72tnaCrVkTltRVf1OOFAKa1FwW_IlOODH5HJ3dxPD2xZTZ9Y-Vdg0tsWwTYYJDkUhqGY9evEPXYVtbPvvDCugVKUsQfcU21FVDClFrM0m-rWNn4aC-TZtfkybb9Pm13RfOt-VPCL-FTTlQkvKvwD5Bndq</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2607875709</pqid></control><display><type>article</type><title>Label Independent Memory for Semi-Supervised Few-Shot Video Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Zhu, Linchao ; Yang, Yi</creator><creatorcontrib>Zhu, Linchao ; Yang, Yi</creatorcontrib><description>In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2020.3007511</identifier><identifier>PMID: 32750804</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Classification ; compound memory networks ; Compounds ; Data models ; Dynamics ; Feature extraction ; Few-shot video classification ; memory-augmented neural networks ; Prototypes ; semi-supervised learning ; Task analysis ; Training ; Video data</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2022-01, Vol.44 (1), p.273-285</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-8a8d8a53031fd05eda78ab558379dfdafa48b2d9a4cc8374d088ef463a73b0d03</citedby><cites>FETCH-LOGICAL-c328t-8a8d8a53031fd05eda78ab558379dfdafa48b2d9a4cc8374d088ef463a73b0d03</cites><orcidid>0000-0002-0512-880X ; 0000-0002-4093-7557</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9134951$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9134951$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhu, Linchao</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><title>Label Independent Memory for Semi-Supervised Few-Shot Video Classification</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><description>In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification.</description><subject>Classification</subject><subject>compound memory networks</subject><subject>Compounds</subject><subject>Data models</subject><subject>Dynamics</subject><subject>Feature extraction</subject><subject>Few-shot video classification</subject><subject>memory-augmented neural networks</subject><subject>Prototypes</subject><subject>semi-supervised learning</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video data</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhoMotn78Ab0sePGydfK1mxyl-FFpUWj1GtLNLKZsNzXZKv57VysevMwLw_MOw0PIGYURpaCvFk_Xs8mIAYMRByglpXtkSDXXOZdc75Mh0ILlSjE1IEcprQCokMAPyYCzUoICMSQPU7vEJpu0DjfYj7bLZrgO8TOrQ8zmuPb5fLvB-O4TuuwWP_L5a-iyF-8wZOPGpuRrX9nOh_aEHNS2SXj6m8fk-fZmMb7Pp493k_H1NK84U12urHLKSg6c1g4kOlsqu5RS8VK72tnaCrVkTltRVf1OOFAKa1FwW_IlOODH5HJ3dxPD2xZTZ9Y-Vdg0tsWwTYYJDkUhqGY9evEPXYVtbPvvDCugVKUsQfcU21FVDClFrM0m-rWNn4aC-TZtfkybb9Pm13RfOt-VPCL-FTTlQkvKvwD5Bndq</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Zhu, Linchao</creator><creator>Yang, Yi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0002-4093-7557</orcidid></search><sort><creationdate>20220101</creationdate><title>Label Independent Memory for Semi-Supervised Few-Shot Video Classification</title><author>Zhu, Linchao ; Yang, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-8a8d8a53031fd05eda78ab558379dfdafa48b2d9a4cc8374d088ef463a73b0d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Classification</topic><topic>compound memory networks</topic><topic>Compounds</topic><topic>Data models</topic><topic>Dynamics</topic><topic>Feature extraction</topic><topic>Few-shot video classification</topic><topic>memory-augmented neural networks</topic><topic>Prototypes</topic><topic>semi-supervised learning</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Linchao</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Linchao</au><au>Yang, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Label Independent Memory for Semi-Supervised Few-Shot Video Classification</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>44</volume><issue>1</issue><spage>273</spage><epage>285</epage><pages>273-285</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>32750804</pmid><doi>10.1109/TPAMI.2020.3007511</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0002-4093-7557</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2022-01, Vol.44 (1), p.273-285
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_ieee_primary_9134951
source	IEEE Electronic Library (IEL)
subjects	Classification compound memory networks Compounds Data models Dynamics Feature extraction Few-shot video classification memory-augmented neural networks Prototypes semi-supervised learning Task analysis Training Video data
title	Label Independent Memory for Semi-Supervised Few-Shot Video Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T21%3A53%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Label%20Independent%20Memory%20for%20Semi-Supervised%20Few-Shot%20Video%20Classification&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Zhu,%20Linchao&rft.date=2022-01-01&rft.volume=44&rft.issue=1&rft.spage=273&rft.epage=285&rft.pages=273-285&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2020.3007511&rft_dat=%3Cproquest_RIE%3E2607875709%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2607875709&rft_id=info:pmid/32750804&rft_ieee_id=9134951&rfr_iscdi=true