Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties
The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conven...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on signal processing 2019-11, Vol.67 (22), p.5865-5880 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5880 |
---|---|
container_issue | 22 |
container_start_page | 5865 |
container_title | IEEE transactions on signal processing |
container_volume | 67 |
creator | Poddar, Sunrita Jacob, Mathews |
description | The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries. |
doi_str_mv | 10.1109/TSP.2019.2944758 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2498488998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8853310</ieee_id><sourcerecordid>2311109078</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</originalsourceid><addsrcrecordid>eNpdkc1LxDAQxYMouq7eBUEKXrx0TZqkTS6C1K8Fv0BFbyFtp7tZuo02reh_b-qui3rKZOY3jzc8hPYIHhGC5fHjw_0owkSOIslYwsUaGhDJSIhZEq_7GnMacpG8bKFt52YYE8ZkvIm2KI1jxrkYoHFada6FxtSTwJbBmW518GzaaXBjnOub53XbGHDB0_fv1tZhaut3-AguOmdsHdxDravWEztoo9SVg93lO0RPF-eP6VV4fXc5Tk-vw5wx1oZZhnVGZJTlTETeh_eISw4QcU54CbQsoACZ8YJCTHzFkhIEZLHwewmPCR2ik4Xua5fNocjBG9SVem3MXDefymqj_k5qM1UT-64SGUkshBc4Wgo09q0D16q5cTlUla7Bdk5FTAomhJQ9evgPndmu8Qd7ipI-Apz0FF5QeWOda6BcmSFY9ZDyOak-J7XMya8c_D5itfATjAf2F4ABgNVYCE4pwfQLA3uXYw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2311109078</pqid></control><display><type>article</type><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><source>IEEE Electronic Library (IEL)</source><creator>Poddar, Sunrita ; Jacob, Mathews</creator><creatorcontrib>Poddar, Sunrita ; Jacob, Mathews</creatorcontrib><description>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</description><identifier>ISSN: 1053-587X</identifier><identifier>EISSN: 1941-0476</identifier><identifier>DOI: 10.1109/TSP.2019.2944758</identifier><identifier>PMID: 33664558</identifier><identifier>CODEN: ITPRED</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Clustering ; Clustering algorithms ; Clustering methods ; Coherence ; Computer simulation ; Data points ; Datasets ; Fines & penalties ; fusion penalties ; Gene expression ; missing entries ; Optimization ; Pattern recognition ; Recommender systems ; Signal processing algorithms</subject><ispartof>IEEE transactions on signal processing, 2019-11, Vol.67 (22), p.5865-5880</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</citedby><cites>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</cites><orcidid>0000-0001-6196-3933 ; 0000-0002-1853-6423</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8853310$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8853310$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33664558$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Poddar, Sunrita</creatorcontrib><creatorcontrib>Jacob, Mathews</creatorcontrib><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><title>IEEE transactions on signal processing</title><addtitle>TSP</addtitle><addtitle>IEEE Trans Signal Process</addtitle><description>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Coherence</subject><subject>Computer simulation</subject><subject>Data points</subject><subject>Datasets</subject><subject>Fines & penalties</subject><subject>fusion penalties</subject><subject>Gene expression</subject><subject>missing entries</subject><subject>Optimization</subject><subject>Pattern recognition</subject><subject>Recommender systems</subject><subject>Signal processing algorithms</subject><issn>1053-587X</issn><issn>1941-0476</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkc1LxDAQxYMouq7eBUEKXrx0TZqkTS6C1K8Fv0BFbyFtp7tZuo02reh_b-qui3rKZOY3jzc8hPYIHhGC5fHjw_0owkSOIslYwsUaGhDJSIhZEq_7GnMacpG8bKFt52YYE8ZkvIm2KI1jxrkYoHFada6FxtSTwJbBmW518GzaaXBjnOub53XbGHDB0_fv1tZhaut3-AguOmdsHdxDravWEztoo9SVg93lO0RPF-eP6VV4fXc5Tk-vw5wx1oZZhnVGZJTlTETeh_eISw4QcU54CbQsoACZ8YJCTHzFkhIEZLHwewmPCR2ik4Xua5fNocjBG9SVem3MXDefymqj_k5qM1UT-64SGUkshBc4Wgo09q0D16q5cTlUla7Bdk5FTAomhJQ9evgPndmu8Qd7ipI-Apz0FF5QeWOda6BcmSFY9ZDyOak-J7XMya8c_D5itfATjAf2F4ABgNVYCE4pwfQLA3uXYw</recordid><startdate>20191115</startdate><enddate>20191115</enddate><creator>Poddar, Sunrita</creator><creator>Jacob, Mathews</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-6196-3933</orcidid><orcidid>https://orcid.org/0000-0002-1853-6423</orcidid></search><sort><creationdate>20191115</creationdate><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><author>Poddar, Sunrita ; Jacob, Mathews</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Coherence</topic><topic>Computer simulation</topic><topic>Data points</topic><topic>Datasets</topic><topic>Fines & penalties</topic><topic>fusion penalties</topic><topic>Gene expression</topic><topic>missing entries</topic><topic>Optimization</topic><topic>Pattern recognition</topic><topic>Recommender systems</topic><topic>Signal processing algorithms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Poddar, Sunrita</creatorcontrib><creatorcontrib>Jacob, Mathews</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>IEEE transactions on signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Poddar, Sunrita</au><au>Jacob, Mathews</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</atitle><jtitle>IEEE transactions on signal processing</jtitle><stitle>TSP</stitle><addtitle>IEEE Trans Signal Process</addtitle><date>2019-11-15</date><risdate>2019</risdate><volume>67</volume><issue>22</issue><spage>5865</spage><epage>5880</epage><pages>5865-5880</pages><issn>1053-587X</issn><eissn>1941-0476</eissn><coden>ITPRED</coden><abstract>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>33664558</pmid><doi>10.1109/TSP.2019.2944758</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-6196-3933</orcidid><orcidid>https://orcid.org/0000-0002-1853-6423</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1053-587X |
ispartof | IEEE transactions on signal processing, 2019-11, Vol.67 (22), p.5865-5880 |
issn | 1053-587X 1941-0476 |
language | eng |
recordid | cdi_proquest_miscellaneous_2498488998 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Clustering Clustering algorithms Clustering methods Coherence Computer simulation Data points Datasets Fines & penalties fusion penalties Gene expression missing entries Optimization Pattern recognition Recommender systems Signal processing algorithms |
title | Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A26%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20Data%20With%20Missing%20Entries%20Using%20Non-Convex%20Fusion%20Penalties&rft.jtitle=IEEE%20transactions%20on%20signal%20processing&rft.au=Poddar,%20Sunrita&rft.date=2019-11-15&rft.volume=67&rft.issue=22&rft.spage=5865&rft.epage=5880&rft.pages=5865-5880&rft.issn=1053-587X&rft.eissn=1941-0476&rft.coden=ITPRED&rft_id=info:doi/10.1109/TSP.2019.2944758&rft_dat=%3Cproquest_RIE%3E2311109078%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2311109078&rft_id=info:pmid/33664558&rft_ieee_id=8853310&rfr_iscdi=true |