Label Propagated Nonnegative Matrix Factorization for Clustering
Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2022-01, Vol.34 (1), p.340-351 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 351 |
---|---|
container_issue | 1 |
container_start_page | 340 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 34 |
creator | Lan, Long Liu, Tongliang Zhang, Xiang Xu, Chuanfu Luo, Zhigang |
description | Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms. |
doi_str_mv | 10.1109/TKDE.2020.2982387 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2607876969</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9044402</ieee_id><sourcerecordid>2607876969</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</originalsourceid><addsrcrecordid>eNo9kE1PAjEQhhujiYj-AONlE8-LM-2ybW8aBDXixwHPTbdMyRLcYrsQ9de7BOJp3kyedyZ5GLtEGCCCvpk9348HHDgMuFZcKHnEejgcqpyjxuMuQ4F5IQp5ys5SWgKAkgp77HZqK1pl7zGs7cK2NM9eQ9NQF-stZS-2jfV3NrGuDbH-7ZahyXyI2Wi1SS3FulmcsxNvV4kuDrPPPibj2egxn749PI3uprnjWrS5L7WkkpQDp5Rz1nISvqyIu0p6x6WXHCrvK0J0gtNceWWB5lBpkIBDEH12vb-7juFrQ6k1y7CJTffS8BKkkqUudUfhnnIxpBTJm3WsP238MQhmJ8rsRJmdKHMQ1XWu9p2aiP55DUVRABd_pYFlJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2607876969</pqid></control><display><type>article</type><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><source>IEEE Electronic Library (IEL)</source><creator>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</creator><creatorcontrib>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</creatorcontrib><description>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2020.2982387</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Bridges ; Clustering ; Clustering algorithms ; Coefficients ; Datasets ; Factorization ; Indicators ; Information retrieval ; K-means ; label propagation ; Labels ; Manifolds ; Matrix decomposition ; Noise measurement ; Nonnegative matrix factorization ; Propagation ; Semi-supervised learning ; Semisupervised learning ; Symmetric matrices</subject><ispartof>IEEE transactions on knowledge and data engineering, 2022-01, Vol.34 (1), p.340-351</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</citedby><cites>FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</cites><orcidid>0000-0002-5201-3802 ; 0000-0002-4876-2368 ; 0000-0002-9640-6472 ; 0000-0002-7552-201X ; 0000-0002-4238-8985</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9044402$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9044402$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lan, Long</creatorcontrib><creatorcontrib>Liu, Tongliang</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Xu, Chuanfu</creatorcontrib><creatorcontrib>Luo, Zhigang</creatorcontrib><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</description><subject>Algorithms</subject><subject>Bridges</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Coefficients</subject><subject>Datasets</subject><subject>Factorization</subject><subject>Indicators</subject><subject>Information retrieval</subject><subject>K-means</subject><subject>label propagation</subject><subject>Labels</subject><subject>Manifolds</subject><subject>Matrix decomposition</subject><subject>Noise measurement</subject><subject>Nonnegative matrix factorization</subject><subject>Propagation</subject><subject>Semi-supervised learning</subject><subject>Semisupervised learning</subject><subject>Symmetric matrices</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1PAjEQhhujiYj-AONlE8-LM-2ybW8aBDXixwHPTbdMyRLcYrsQ9de7BOJp3kyedyZ5GLtEGCCCvpk9348HHDgMuFZcKHnEejgcqpyjxuMuQ4F5IQp5ys5SWgKAkgp77HZqK1pl7zGs7cK2NM9eQ9NQF-stZS-2jfV3NrGuDbH-7ZahyXyI2Wi1SS3FulmcsxNvV4kuDrPPPibj2egxn749PI3uprnjWrS5L7WkkpQDp5Rz1nISvqyIu0p6x6WXHCrvK0J0gtNceWWB5lBpkIBDEH12vb-7juFrQ6k1y7CJTffS8BKkkqUudUfhnnIxpBTJm3WsP238MQhmJ8rsRJmdKHMQ1XWu9p2aiP55DUVRABd_pYFlJg</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Lan, Long</creator><creator>Liu, Tongliang</creator><creator>Zhang, Xiang</creator><creator>Xu, Chuanfu</creator><creator>Luo, Zhigang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5201-3802</orcidid><orcidid>https://orcid.org/0000-0002-4876-2368</orcidid><orcidid>https://orcid.org/0000-0002-9640-6472</orcidid><orcidid>https://orcid.org/0000-0002-7552-201X</orcidid><orcidid>https://orcid.org/0000-0002-4238-8985</orcidid></search><sort><creationdate>20220101</creationdate><title>Label Propagated Nonnegative Matrix Factorization for Clustering</title><author>Lan, Long ; Liu, Tongliang ; Zhang, Xiang ; Xu, Chuanfu ; Luo, Zhigang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-f697e6e8c0c88ccaa2e3f6be2cb7fc27f720bffbe11c32ed8f8a0ed0b90701503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Bridges</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Coefficients</topic><topic>Datasets</topic><topic>Factorization</topic><topic>Indicators</topic><topic>Information retrieval</topic><topic>K-means</topic><topic>label propagation</topic><topic>Labels</topic><topic>Manifolds</topic><topic>Matrix decomposition</topic><topic>Noise measurement</topic><topic>Nonnegative matrix factorization</topic><topic>Propagation</topic><topic>Semi-supervised learning</topic><topic>Semisupervised learning</topic><topic>Symmetric matrices</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lan, Long</creatorcontrib><creatorcontrib>Liu, Tongliang</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Xu, Chuanfu</creatorcontrib><creatorcontrib>Luo, Zhigang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lan, Long</au><au>Liu, Tongliang</au><au>Zhang, Xiang</au><au>Xu, Chuanfu</au><au>Luo, Zhigang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Label Propagated Nonnegative Matrix Factorization for Clustering</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>34</volume><issue>1</issue><spage>340</spage><epage>351</epage><pages>340-351</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2020.2982387</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5201-3802</orcidid><orcidid>https://orcid.org/0000-0002-4876-2368</orcidid><orcidid>https://orcid.org/0000-0002-9640-6472</orcidid><orcidid>https://orcid.org/0000-0002-7552-201X</orcidid><orcidid>https://orcid.org/0000-0002-4238-8985</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2022-01, Vol.34 (1), p.340-351 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_journals_2607876969 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Bridges Clustering Clustering algorithms Coefficients Datasets Factorization Indicators Information retrieval K-means label propagation Labels Manifolds Matrix decomposition Noise measurement Nonnegative matrix factorization Propagation Semi-supervised learning Semisupervised learning Symmetric matrices |
title | Label Propagated Nonnegative Matrix Factorization for Clustering |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A06%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Label%20Propagated%20Nonnegative%20Matrix%20Factorization%20for%20Clustering&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Lan,%20Long&rft.date=2022-01-01&rft.volume=34&rft.issue=1&rft.spage=340&rft.epage=351&rft.pages=340-351&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2020.2982387&rft_dat=%3Cproquest_RIE%3E2607876969%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2607876969&rft_id=info:pmid/&rft_ieee_id=9044402&rfr_iscdi=true |