CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning

The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2019-05, Vol.31 (5), p.853-866
Hauptverfasser: Jian, Songlei, Pang, Guansong, Cao, Longbing, Lu, Kai, Gao, Hang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 866
container_issue 5
container_start_page 853
container_title IEEE transactions on knowledge and data engineering
container_volume 31
creator Jian, Songlei
Pang, Guansong
Cao, Longbing
Lu, Kai
Gao, Hang
description The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters.
doi_str_mv 10.1109/TKDE.2018.2848902
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TKDE_2018_2848902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8395013</ieee_id><sourcerecordid>2203399514</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-32ce2fec1ac4f04b3ef20ab4828dc6ef927031ce3f6faf4ebb43b3afa26abdb43</originalsourceid><addsrcrecordid>eNo9kMFKw0AQhhdRsFYfQLwEPKfu7G7SjTeJrRULSmnPy-x2tqbEJG5SsG9vaoun-Qe-fwY-xm6BjwB49rB8e56MBAc9ElrpjIszNoAk0bGADM77zBXESqrxJbtq2y3nXI81DNhHvlpMHqNpST-FLSnKsaNNHQqHZfSMHUYLagK1VHXYFXUV2X00KyhgcJ9_TF7vmrKoNtGcMFR9uGYXHsuWbk5zyFbTyTKfxfP3l9f8aR47KdMulsKR8OQAnfJcWUlecLRKC712KflMjLkER9KnHr0ia5W0Ej2KFO26X4bs_ni3CfX3jtrObOtdqPqXRgguZZYlcKDgSLlQt20gb5pQfGHYG-DmIM4cxJmDOHMS13fujp2CiP55LbOEg5S_ZVdqhw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2203399514</pqid></control><display><type>article</type><title>CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Jian, Songlei ; Pang, Guansong ; Cao, Longbing ; Lu, Kai ; Gao, Hang</creator><creatorcontrib>Jian, Songlei ; Pang, Guansong ; Cao, Longbing ; Lu, Kai ; Gao, Hang</creatorcontrib><description>The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2018.2848902</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Anomaly detection ; Categorical data representation ; Clustering ; coupling learning ; Couplings ; Data analysis ; Data models ; Dimensional stability ; Encoding ; Estimating techniques ; Learning ; non-IID learning ; outlier detection ; Outliers (statistics) ; Representations ; Semantics ; Task analysis ; Task complexity ; Unsupervised learning</subject><ispartof>IEEE transactions on knowledge and data engineering, 2019-05, Vol.31 (5), p.853-866</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-32ce2fec1ac4f04b3ef20ab4828dc6ef927031ce3f6faf4ebb43b3afa26abdb43</citedby><cites>FETCH-LOGICAL-c336t-32ce2fec1ac4f04b3ef20ab4828dc6ef927031ce3f6faf4ebb43b3afa26abdb43</cites><orcidid>0000-0001-5760-6431 ; 0000-0003-1562-9429</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8395013$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8395013$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jian, Songlei</creatorcontrib><creatorcontrib>Pang, Guansong</creatorcontrib><creatorcontrib>Cao, Longbing</creatorcontrib><creatorcontrib>Lu, Kai</creatorcontrib><creatorcontrib>Gao, Hang</creatorcontrib><title>CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters.</description><subject>Anomaly detection</subject><subject>Categorical data representation</subject><subject>Clustering</subject><subject>coupling learning</subject><subject>Couplings</subject><subject>Data analysis</subject><subject>Data models</subject><subject>Dimensional stability</subject><subject>Encoding</subject><subject>Estimating techniques</subject><subject>Learning</subject><subject>non-IID learning</subject><subject>outlier detection</subject><subject>Outliers (statistics)</subject><subject>Representations</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Task complexity</subject><subject>Unsupervised learning</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMFKw0AQhhdRsFYfQLwEPKfu7G7SjTeJrRULSmnPy-x2tqbEJG5SsG9vaoun-Qe-fwY-xm6BjwB49rB8e56MBAc9ElrpjIszNoAk0bGADM77zBXESqrxJbtq2y3nXI81DNhHvlpMHqNpST-FLSnKsaNNHQqHZfSMHUYLagK1VHXYFXUV2X00KyhgcJ9_TF7vmrKoNtGcMFR9uGYXHsuWbk5zyFbTyTKfxfP3l9f8aR47KdMulsKR8OQAnfJcWUlecLRKC712KflMjLkER9KnHr0ia5W0Ej2KFO26X4bs_ni3CfX3jtrObOtdqPqXRgguZZYlcKDgSLlQt20gb5pQfGHYG-DmIM4cxJmDOHMS13fujp2CiP55LbOEg5S_ZVdqhw</recordid><startdate>20190501</startdate><enddate>20190501</enddate><creator>Jian, Songlei</creator><creator>Pang, Guansong</creator><creator>Cao, Longbing</creator><creator>Lu, Kai</creator><creator>Gao, Hang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5760-6431</orcidid><orcidid>https://orcid.org/0000-0003-1562-9429</orcidid></search><sort><creationdate>20190501</creationdate><title>CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning</title><author>Jian, Songlei ; Pang, Guansong ; Cao, Longbing ; Lu, Kai ; Gao, Hang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-32ce2fec1ac4f04b3ef20ab4828dc6ef927031ce3f6faf4ebb43b3afa26abdb43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Anomaly detection</topic><topic>Categorical data representation</topic><topic>Clustering</topic><topic>coupling learning</topic><topic>Couplings</topic><topic>Data analysis</topic><topic>Data models</topic><topic>Dimensional stability</topic><topic>Encoding</topic><topic>Estimating techniques</topic><topic>Learning</topic><topic>non-IID learning</topic><topic>outlier detection</topic><topic>Outliers (statistics)</topic><topic>Representations</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Task complexity</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jian, Songlei</creatorcontrib><creatorcontrib>Pang, Guansong</creatorcontrib><creatorcontrib>Cao, Longbing</creatorcontrib><creatorcontrib>Lu, Kai</creatorcontrib><creatorcontrib>Gao, Hang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jian, Songlei</au><au>Pang, Guansong</au><au>Cao, Longbing</au><au>Lu, Kai</au><au>Gao, Hang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2019-05-01</date><risdate>2019</risdate><volume>31</volume><issue>5</issue><spage>853</spage><epage>866</epage><pages>853-866</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2018.2848902</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5760-6431</orcidid><orcidid>https://orcid.org/0000-0003-1562-9429</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2019-05, Vol.31 (5), p.853-866
issn 1041-4347
1558-2191
language eng
recordid cdi_crossref_primary_10_1109_TKDE_2018_2848902
source IEEE Electronic Library (IEL)
subjects Anomaly detection
Categorical data representation
Clustering
coupling learning
Couplings
Data analysis
Data models
Dimensional stability
Encoding
Estimating techniques
Learning
non-IID learning
outlier detection
Outliers (statistics)
Representations
Semantics
Task analysis
Task complexity
Unsupervised learning
title CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T06%3A24%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CURE:%20Flexible%20Categorical%20Data%20Representation%20by%20Hierarchical%20Coupling%20Learning&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Jian,%20Songlei&rft.date=2019-05-01&rft.volume=31&rft.issue=5&rft.spage=853&rft.epage=866&rft.pages=853-866&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2018.2848902&rft_dat=%3Cproquest_RIE%3E2203399514%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2203399514&rft_id=info:pmid/&rft_ieee_id=8395013&rfr_iscdi=true