Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties

The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conven...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on signal processing 2019-11, Vol.67 (22), p.5865-5880
Hauptverfasser: Poddar, Sunrita, Jacob, Mathews
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5880
container_issue 22
container_start_page 5865
container_title IEEE transactions on signal processing
container_volume 67
creator Poddar, Sunrita
Jacob, Mathews
description The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.
doi_str_mv 10.1109/TSP.2019.2944758
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2498488998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8853310</ieee_id><sourcerecordid>2311109078</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</originalsourceid><addsrcrecordid>eNpdkc1LxDAQxYMouq7eBUEKXrx0TZqkTS6C1K8Fv0BFbyFtp7tZuo02reh_b-qui3rKZOY3jzc8hPYIHhGC5fHjw_0owkSOIslYwsUaGhDJSIhZEq_7GnMacpG8bKFt52YYE8ZkvIm2KI1jxrkYoHFada6FxtSTwJbBmW518GzaaXBjnOub53XbGHDB0_fv1tZhaut3-AguOmdsHdxDravWEztoo9SVg93lO0RPF-eP6VV4fXc5Tk-vw5wx1oZZhnVGZJTlTETeh_eISw4QcU54CbQsoACZ8YJCTHzFkhIEZLHwewmPCR2ik4Xua5fNocjBG9SVem3MXDefymqj_k5qM1UT-64SGUkshBc4Wgo09q0D16q5cTlUla7Bdk5FTAomhJQ9evgPndmu8Qd7ipI-Apz0FF5QeWOda6BcmSFY9ZDyOak-J7XMya8c_D5itfATjAf2F4ABgNVYCE4pwfQLA3uXYw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2311109078</pqid></control><display><type>article</type><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><source>IEEE Electronic Library (IEL)</source><creator>Poddar, Sunrita ; Jacob, Mathews</creator><creatorcontrib>Poddar, Sunrita ; Jacob, Mathews</creatorcontrib><description>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</description><identifier>ISSN: 1053-587X</identifier><identifier>EISSN: 1941-0476</identifier><identifier>DOI: 10.1109/TSP.2019.2944758</identifier><identifier>PMID: 33664558</identifier><identifier>CODEN: ITPRED</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Clustering ; Clustering algorithms ; Clustering methods ; Coherence ; Computer simulation ; Data points ; Datasets ; Fines &amp; penalties ; fusion penalties ; Gene expression ; missing entries ; Optimization ; Pattern recognition ; Recommender systems ; Signal processing algorithms</subject><ispartof>IEEE transactions on signal processing, 2019-11, Vol.67 (22), p.5865-5880</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</citedby><cites>FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</cites><orcidid>0000-0001-6196-3933 ; 0000-0002-1853-6423</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8853310$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8853310$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33664558$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Poddar, Sunrita</creatorcontrib><creatorcontrib>Jacob, Mathews</creatorcontrib><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><title>IEEE transactions on signal processing</title><addtitle>TSP</addtitle><addtitle>IEEE Trans Signal Process</addtitle><description>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Coherence</subject><subject>Computer simulation</subject><subject>Data points</subject><subject>Datasets</subject><subject>Fines &amp; penalties</subject><subject>fusion penalties</subject><subject>Gene expression</subject><subject>missing entries</subject><subject>Optimization</subject><subject>Pattern recognition</subject><subject>Recommender systems</subject><subject>Signal processing algorithms</subject><issn>1053-587X</issn><issn>1941-0476</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkc1LxDAQxYMouq7eBUEKXrx0TZqkTS6C1K8Fv0BFbyFtp7tZuo02reh_b-qui3rKZOY3jzc8hPYIHhGC5fHjw_0owkSOIslYwsUaGhDJSIhZEq_7GnMacpG8bKFt52YYE8ZkvIm2KI1jxrkYoHFada6FxtSTwJbBmW518GzaaXBjnOub53XbGHDB0_fv1tZhaut3-AguOmdsHdxDravWEztoo9SVg93lO0RPF-eP6VV4fXc5Tk-vw5wx1oZZhnVGZJTlTETeh_eISw4QcU54CbQsoACZ8YJCTHzFkhIEZLHwewmPCR2ik4Xua5fNocjBG9SVem3MXDefymqj_k5qM1UT-64SGUkshBc4Wgo09q0D16q5cTlUla7Bdk5FTAomhJQ9evgPndmu8Qd7ipI-Apz0FF5QeWOda6BcmSFY9ZDyOak-J7XMya8c_D5itfATjAf2F4ABgNVYCE4pwfQLA3uXYw</recordid><startdate>20191115</startdate><enddate>20191115</enddate><creator>Poddar, Sunrita</creator><creator>Jacob, Mathews</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-6196-3933</orcidid><orcidid>https://orcid.org/0000-0002-1853-6423</orcidid></search><sort><creationdate>20191115</creationdate><title>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</title><author>Poddar, Sunrita ; Jacob, Mathews</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-bb0ab192bc4823361050f5ee25515fe3fdede9b5d3e61de947fe8eb68b0a75613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Coherence</topic><topic>Computer simulation</topic><topic>Data points</topic><topic>Datasets</topic><topic>Fines &amp; penalties</topic><topic>fusion penalties</topic><topic>Gene expression</topic><topic>missing entries</topic><topic>Optimization</topic><topic>Pattern recognition</topic><topic>Recommender systems</topic><topic>Signal processing algorithms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Poddar, Sunrita</creatorcontrib><creatorcontrib>Jacob, Mathews</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>IEEE transactions on signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Poddar, Sunrita</au><au>Jacob, Mathews</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties</atitle><jtitle>IEEE transactions on signal processing</jtitle><stitle>TSP</stitle><addtitle>IEEE Trans Signal Process</addtitle><date>2019-11-15</date><risdate>2019</risdate><volume>67</volume><issue>22</issue><spage>5865</spage><epage>5880</epage><pages>5865-5880</pages><issn>1053-587X</issn><eissn>1941-0476</eissn><coden>ITPRED</coden><abstract>The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a l_0 fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and the ASL dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>33664558</pmid><doi>10.1109/TSP.2019.2944758</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-6196-3933</orcidid><orcidid>https://orcid.org/0000-0002-1853-6423</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1053-587X
ispartof IEEE transactions on signal processing, 2019-11, Vol.67 (22), p.5865-5880
issn 1053-587X
1941-0476
language eng
recordid cdi_proquest_miscellaneous_2498488998
source IEEE Electronic Library (IEL)
subjects Algorithms
Clustering
Clustering algorithms
Clustering methods
Coherence
Computer simulation
Data points
Datasets
Fines & penalties
fusion penalties
Gene expression
missing entries
Optimization
Pattern recognition
Recommender systems
Signal processing algorithms
title Clustering of Data With Missing Entries Using Non-Convex Fusion Penalties
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A26%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20of%20Data%20With%20Missing%20Entries%20Using%20Non-Convex%20Fusion%20Penalties&rft.jtitle=IEEE%20transactions%20on%20signal%20processing&rft.au=Poddar,%20Sunrita&rft.date=2019-11-15&rft.volume=67&rft.issue=22&rft.spage=5865&rft.epage=5880&rft.pages=5865-5880&rft.issn=1053-587X&rft.eissn=1941-0476&rft.coden=ITPRED&rft_id=info:doi/10.1109/TSP.2019.2944758&rft_dat=%3Cproquest_RIE%3E2311109078%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2311109078&rft_id=info:pmid/33664558&rft_ieee_id=8853310&rfr_iscdi=true