Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distributio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sarfraz, M. Saquib, Koulakis, Marios, Seibold, Constantin, Stiefelhagen, Rainer
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Sarfraz, M. Saquib
Koulakis, Marios
Seibold, Constantin
Stiefelhagen, Rainer
description Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne
doi_str_mv 10.48550/arxiv.2203.12997
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2203_12997</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2203_12997</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-90e7a50db36a639a683b8d1f8436a89d10ddcfdeeece87d1d9bbeb21d58187bf3</originalsourceid><addsrcrecordid>eNotj7FOwzAURb0woMIHMOEfSLDjJrZHVEKLVBUJZY-e_Z4bS0laOQHRvycUpqN7hisdxh6kyNemLMUTpO_4lReFULksrNW3rNlFSpB8Fz30_ECQaJoXxmPnTolvE5w7Xg-OEON45GFxdQjRRxpn_hIHGqd4GqGP84V_EH76eZl37CZAP9H9P1esea2bzS7bv2_fNs_7DCqtMytIQynQqQoqZaEyyhmUwawXYSxKgegDEpEno1GidY5cIbE00mgX1Io9_t1es9pzigOkS_ub117z1A8PAEyQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction</title><source>arXiv.org</source><creator>Sarfraz, M. Saquib ; Koulakis, Marios ; Seibold, Constantin ; Stiefelhagen, Rainer</creator><creatorcontrib>Sarfraz, M. Saquib ; Koulakis, Marios ; Seibold, Constantin ; Stiefelhagen, Rainer</creatorcontrib><description>Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne</description><identifier>DOI: 10.48550/arxiv.2203.12997</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Data Structures and Algorithms ; Computer Science - Graphics</subject><creationdate>2022-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2203.12997$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2203.12997$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sarfraz, M. Saquib</creatorcontrib><creatorcontrib>Koulakis, Marios</creatorcontrib><creatorcontrib>Seibold, Constantin</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><title>Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction</title><description>Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Data Structures and Algorithms</subject><subject>Computer Science - Graphics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7FOwzAURb0woMIHMOEfSLDjJrZHVEKLVBUJZY-e_Z4bS0laOQHRvycUpqN7hisdxh6kyNemLMUTpO_4lReFULksrNW3rNlFSpB8Fz30_ECQaJoXxmPnTolvE5w7Xg-OEON45GFxdQjRRxpn_hIHGqd4GqGP84V_EH76eZl37CZAP9H9P1esea2bzS7bv2_fNs_7DCqtMytIQynQqQoqZaEyyhmUwawXYSxKgegDEpEno1GidY5cIbE00mgX1Io9_t1es9pzigOkS_ub117z1A8PAEyQ</recordid><startdate>20220324</startdate><enddate>20220324</enddate><creator>Sarfraz, M. Saquib</creator><creator>Koulakis, Marios</creator><creator>Seibold, Constantin</creator><creator>Stiefelhagen, Rainer</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220324</creationdate><title>Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction</title><author>Sarfraz, M. Saquib ; Koulakis, Marios ; Seibold, Constantin ; Stiefelhagen, Rainer</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-90e7a50db36a639a683b8d1f8436a89d10ddcfdeeece87d1d9bbeb21d58187bf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Data Structures and Algorithms</topic><topic>Computer Science - Graphics</topic><toplevel>online_resources</toplevel><creatorcontrib>Sarfraz, M. Saquib</creatorcontrib><creatorcontrib>Koulakis, Marios</creatorcontrib><creatorcontrib>Seibold, Constantin</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sarfraz, M. Saquib</au><au>Koulakis, Marios</au><au>Seibold, Constantin</au><au>Stiefelhagen, Rainer</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction</atitle><date>2022-03-24</date><risdate>2022</risdate><abstract>Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne</abstract><doi>10.48550/arxiv.2203.12997</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2203.12997
ispartof
issn
language eng
recordid cdi_arxiv_primary_2203_12997
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Data Structures and Algorithms
Computer Science - Graphics
title Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T22%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Nearest%20Neighbor%20Graph%20Embedding%20for%20Efficient%20Dimensionality%20Reduction&rft.au=Sarfraz,%20M.%20Saquib&rft.date=2022-03-24&rft_id=info:doi/10.48550/arxiv.2203.12997&rft_dat=%3Carxiv_GOX%3E2203_12997%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true