Label Propagated Nonnegative Matrix Factorization for Clustering

Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2022-01, Vol.34 (1), p.340-351
Hauptverfasser:	Lan, Long, Liu, Tongliang, Zhang, Xiang, Xu, Chuanfu, Luo, Zhigang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bridges Clustering Clustering algorithms Coefficients Datasets Factorization Indicators Information retrieval K-means label propagation Labels Manifolds Matrix decomposition Noise measurement Nonnegative matrix factorization Propagation Semi-supervised learning Semisupervised learning Symmetric matrices
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	351
container_issue	1
container_start_page	340
container_title	IEEE transactions on knowledge and data engineering
container_volume	34
creator	Lan, Long Liu, Tongliang Zhang, Xiang Xu, Chuanfu Luo, Zhigang
description	Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.
doi_str_mv	10.1109/TKDE.2020.2982387
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2607876969</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9044402</ieee_id><sourcerecordid>2607876969</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</originalsourceid><addsrcrecordid>eNo9kE1PAjEQhhujiYj-AONlE8-LM-2ybW8aBDXixwHPTbdMyRLcYrsQ9de7BOJp3kyedyZ5GLtEGCCCvpk9348HHDgMuFZcKHnEejgcqpyjxuMuQ4F5IQp5ys5SWgKAkgp77HZqK1pl7zGs7cK2NM9eQ9NQF-stZS-2jfV3NrGuDbH-7ZahyXyI2Wi1SS3FulmcsxNvV4kuDrPPPibj2egxn749PI3uprnjWrS5L7WkkpQDp5Rz1nISvqyIu0p6x6WXHCrvK0J0gtNceWWB5lBpkIBDEH12vb-7juFrQ6k1y7CJTffS8BKkkqUudUfhnnIxpBTJm3WsP238MQhmJ8rsRJmdKHMQ1XWu9p2aiP55DUVRABd_pYFlJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2607876969</pqid></control><display><type>article</type><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><source>IEEE Electronic Library (IEL)</source><creator>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</creator><creatorcontrib>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</creatorcontrib><description>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2020.2982387</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Bridges ; Clustering ; Clustering algorithms ; Coefficients ; Datasets ; Factorization ; Indicators ; Information retrieval ; K-means ; label propagation ; Labels ; Manifolds ; Matrix decomposition ; Noise measurement ; Nonnegative matrix factorization ; Propagation ; Semi-supervised learning ; Semisupervised learning ; Symmetric matrices</subject><ispartof>IEEE transactions on knowledge and data engineering, 2022-01, Vol.34 (1), p.340-351</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</citedby><cites>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</cites><orcidid>0000-0002-5201-3802 ; 0000-0002-4876-2368 ; 0000-0002-9640-6472 ; 0000-0002-7552-201X ; 0000-0002-4238-8985</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9044402$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9044402$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lan, Long</creatorcontrib><creatorcontrib>Liu, Tongliang</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Xu, Chuanfu</creatorcontrib><creatorcontrib>Luo, Zhigang</creatorcontrib><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</description><subject>Algorithms</subject><subject>Bridges</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Coefficients</subject><subject>Datasets</subject><subject>Factorization</subject><subject>Indicators</subject><subject>Information retrieval</subject><subject>K-means</subject><subject>label propagation</subject><subject>Labels</subject><subject>Manifolds</subject><subject>Matrix decomposition</subject><subject>Noise measurement</subject><subject>Nonnegative matrix factorization</subject><subject>Propagation</subject><subject>Semi-supervised learning</subject><subject>Semisupervised learning</subject><subject>Symmetric matrices</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1PAjEQhhujiYj-AONlE8-LM-2ybW8aBDXixwHPTbdMyRLcYrsQ9de7BOJp3kyedyZ5GLtEGCCCvpk9348HHDgMuFZcKHnEejgcqpyjxuMuQ4F5IQp5ys5SWgKAkgp77HZqK1pl7zGs7cK2NM9eQ9NQF-stZS-2jfV3NrGuDbH-7ZahyXyI2Wi1SS3FulmcsxNvV4kuDrPPPibj2egxn749PI3uprnjWrS5L7WkkpQDp5Rz1nISvqyIu0p6x6WXHCrvK0J0gtNceWWB5lBpkIBDEH12vb-7juFrQ6k1y7CJTffS8BKkkqUudUfhnnIxpBTJm3WsP238MQhmJ8rsRJmdKHMQ1XWu9p2aiP55DUVRABd_pYFlJg</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Lan, Long</creator><creator>Liu, Tongliang</creator><creator>Zhang, Xiang</creator><creator>Xu, Chuanfu</creator><creator>Luo, Zhigang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5201-3802</orcidid><orcidid>https://orcid.org/0000-0002-4876-2368</orcidid><orcidid>https://orcid.org/0000-0002-9640-6472</orcidid><orcidid>https://orcid.org/0000-0002-7552-201X</orcidid><orcidid>https://orcid.org/0000-0002-4238-8985</orcidid></search><sort><creationdate>20220101</creationdate><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><author>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Bridges</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Coefficients</topic><topic>Datasets</topic><topic>Factorization</topic><topic>Indicators</topic><topic>Information retrieval</topic><topic>K-means</topic><topic>label propagation</topic><topic>Labels</topic><topic>Manifolds</topic><topic>Matrix decomposition</topic><topic>Noise measurement</topic><topic>Nonnegative matrix factorization</topic><topic>Propagation</topic><topic>Semi-supervised learning</topic><topic>Semisupervised learning</topic><topic>Symmetric matrices</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lan, Long</creatorcontrib><creatorcontrib>Liu, Tongliang</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Xu, Chuanfu</creatorcontrib><creatorcontrib>Luo, Zhigang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lan, Long</au><au>Liu, Tongliang</au><au>Zhang, Xiang</au><au>Xu, Chuanfu</au><au>Luo, Zhigang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Label Propagated Nonnegative Matrix Factorization for Clustering</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>34</volume><issue>1</issue><spage>340</spage><epage>351</epage><pages>340-351</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2020.2982387</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5201-3802</orcidid><orcidid>https://orcid.org/0000-0002-4876-2368</orcidid><orcidid>https://orcid.org/0000-0002-9640-6472</orcidid><orcidid>https://orcid.org/0000-0002-7552-201X</orcidid><orcidid>https://orcid.org/0000-0002-4238-8985</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1041-4347
ispartof	IEEE transactions on knowledge and data engineering, 2022-01, Vol.34 (1), p.340-351
issn	1041-4347 1558-2191
language	eng
recordid	cdi_proquest_journals_2607876969
source	IEEE Electronic Library (IEL)
subjects	Algorithms Bridges Clustering Clustering algorithms Coefficients Datasets Factorization Indicators Information retrieval K-means label propagation Labels Manifolds Matrix decomposition Noise measurement Nonnegative matrix factorization Propagation Semi-supervised learning Semisupervised learning Symmetric matrices
title	Label Propagated Nonnegative Matrix Factorization for Clustering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A06%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Label%20Propagated%20Nonnegative%20Matrix%20Factorization%20for%20Clustering&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Lan,%20Long&rft.date=2022-01-01&rft.volume=34&rft.issue=1&rft.spage=340&rft.epage=351&rft.pages=340-351&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2020.2982387&rft_dat=%3Cproquest_RIE%3E2607876969%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2607876969&rft_id=info:pmid/&rft_ieee_id=9044402&rfr_iscdi=true