Learning A Disentangling Representation For PU Learning

In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zamzam, Omar, Akrami, Haleh, Soltanolkotabi, Mahdi, Leahy, Richard
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zamzam, Omar Akrami, Haleh Soltanolkotabi, Mahdi Leahy, Richard
description	In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.
doi_str_mv	10.48550/arxiv.2310.03833
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_03833</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_03833</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-ba74b15e5d5dcd305916716c85e010c56ff0c5753dbf56de41f610b1dacb28a33</originalsourceid><addsrcrecordid>eNo1T8uqAjEU68aF6P0AV_YHRlvPnHZcis8LA4roejidtlLQKlXk-vfqXN0kJISQMNaTYpAXiGJI6S_cByN4GQIKgDbTpaMUQzzwCZ-Fq4s3iofjW2_dJbnGuIVz5Itz4ps9_8a7rOXpeHU_H-6w3WK-m66ycr38nU7KjJSGzJDOjUSHFm1tQeBYKi1VXaATUtSovH-hRrDGo7Iul15JYaSl2owKAuiw_n9ts7y6pHCi9KjeD6rmATwBD0VAPg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning A Disentangling Representation For PU Learning</title><source>arXiv.org</source><creator>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</creator><creatorcontrib>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</creatorcontrib><description>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</description><identifier>DOI: 10.48550/arxiv.2310.03833</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.03833$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.03833$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zamzam, Omar</creatorcontrib><creatorcontrib>Akrami, Haleh</creatorcontrib><creatorcontrib>Soltanolkotabi, Mahdi</creatorcontrib><creatorcontrib>Leahy, Richard</creatorcontrib><title>Learning A Disentangling Representation For PU Learning</title><description>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1T8uqAjEU68aF6P0AV_YHRlvPnHZcis8LA4roejidtlLQKlXk-vfqXN0kJISQMNaTYpAXiGJI6S_cByN4GQIKgDbTpaMUQzzwCZ-Fq4s3iofjW2_dJbnGuIVz5Itz4ps9_8a7rOXpeHU_H-6w3WK-m66ycr38nU7KjJSGzJDOjUSHFm1tQeBYKi1VXaATUtSovH-hRrDGo7Iul15JYaSl2owKAuiw_n9ts7y6pHCi9KjeD6rmATwBD0VAPg</recordid><startdate>20231005</startdate><enddate>20231005</enddate><creator>Zamzam, Omar</creator><creator>Akrami, Haleh</creator><creator>Soltanolkotabi, Mahdi</creator><creator>Leahy, Richard</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231005</creationdate><title>Learning A Disentangling Representation For PU Learning</title><author>Zamzam, Omar ; Akrami, Haleh ; Soltanolkotabi, Mahdi ; Leahy, Richard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-ba74b15e5d5dcd305916716c85e010c56ff0c5753dbf56de41f610b1dacb28a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zamzam, Omar</creatorcontrib><creatorcontrib>Akrami, Haleh</creatorcontrib><creatorcontrib>Soltanolkotabi, Mahdi</creatorcontrib><creatorcontrib>Leahy, Richard</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zamzam, Omar</au><au>Akrami, Haleh</au><au>Soltanolkotabi, Mahdi</au><au>Leahy, Richard</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning A Disentangling Representation For PU Learning</atitle><date>2023-10-05</date><risdate>2023</risdate><abstract>In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.</abstract><doi>10.48550/arxiv.2310.03833</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.03833
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_03833
source	arXiv.org
subjects	Computer Science - Learning
title	Learning A Disentangling Representation For PU Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T06%3A23%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20A%20Disentangling%20Representation%20For%20PU%20Learning&rft.au=Zamzam,%20Omar&rft.date=2023-10-05&rft_id=info:doi/10.48550/arxiv.2310.03833&rft_dat=%3Carxiv_GOX%3E2310_03833%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true