Learning A Disentangling Representation For PU Learning

In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zamzam, Omar, Akrami, Haleh, Soltanolkotabi, Mahdi, Leahy, Richard
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zamzam, Omar
Akrami, Haleh
Soltanolkotabi, Mahdi
Leahy, Richard
description In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.
doi_str_mv 10.48550/arxiv.2310.03833
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_03833</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_03833</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-ba74b15e5d5dcd305916716c85e010c56ff0c5753dbf56de41f610b1dacb28a33</originalsourceid><addsrcrecordid>eNo1T8uqAjEU68aF6P0AV_YHRlvPnHZcis8LA4roejidtlLQKlXk-vfqXN0kJISQMNaTYpAXiGJI6S_cByN4GQIKgDbTpaMUQzzwCZ-Fq4s3iofjW2_dJbnGuIVz5Itz4ps9_8a7rOXpeHU_H-6w3WK-m66ycr38nU7KjJSGzJDOjUSHFm1tQeBYKi1VXaATUtSovH-hRrDGo7Iul15JYaSl2owKAuiw_n9ts7y6pHCi9KjeD6rmATwBD0VAPg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning A Disentangling Representation For PU Learning</title><source>arXiv.org</source><creator>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</creator><creatorcontrib>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</creatorcontrib><description>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</description><identifier>DOI: 10.48550/arxiv.2310.03833</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.03833$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.03833$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zamzam, Omar</creatorcontrib><creatorcontrib>Akrami, Haleh</creatorcontrib><creatorcontrib>Soltanolkotabi, Mahdi</creatorcontrib><creatorcontrib>Leahy, Richard</creatorcontrib><title>Learning A Disentangling Representation For PU Learning</title><description>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1T8uqAjEU68aF6P0AV_YHRlvPnHZcis8LA4roejidtlLQKlXk-vfqXN0kJISQMNaTYpAXiGJI6S_cByN4GQIKgDbTpaMUQzzwCZ-Fq4s3iofjW2_dJbnGuIVz5Itz4ps9_8a7rOXpeHU_H-6w3WK-m66ycr38nU7KjJSGzJDOjUSHFm1tQeBYKi1VXaATUtSovH-hRrDGo7Iul15JYaSl2owKAuiw_n9ts7y6pHCi9KjeD6rmATwBD0VAPg</recordid><startdate>20231005</startdate><enddate>20231005</enddate><creator>Zamzam, Omar</creator><creator>Akrami, Haleh</creator><creator>Soltanolkotabi, Mahdi</creator><creator>Leahy, Richard</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231005</creationdate><title>Learning A Disentangling Representation For PU Learning</title><author>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-ba74b15e5d5dcd305916716c85e010c56ff0c5753dbf56de41f610b1dacb28a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zamzam, Omar</creatorcontrib><creatorcontrib>Akrami, Haleh</creatorcontrib><creatorcontrib>Soltanolkotabi, Mahdi</creatorcontrib><creatorcontrib>Leahy, Richard</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zamzam, Omar</au><au>Akrami, Haleh</au><au>Soltanolkotabi, Mahdi</au><au>Leahy, Richard</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning A Disentangling Representation For PU Learning</atitle><date>2023-10-05</date><risdate>2023</risdate><abstract>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</abstract><doi>10.48550/arxiv.2310.03833</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2310.03833
ispartof
issn
language eng
recordid cdi_arxiv_primary_2310_03833
source arXiv.org
subjects Computer Science - Learning
title Learning A Disentangling Representation For PU Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T06%3A23%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20A%20Disentangling%20Representation%20For%20PU%20Learning&rft.au=Zamzam,%20Omar&rft.date=2023-10-05&rft_id=info:doi/10.48550/arxiv.2310.03833&rft_dat=%3Carxiv_GOX%3E2310_03833%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true