Optimal Projections in the Distance-Based Statistical Methods

This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2019-11
Hauptverfasser: Yu, Chuanping, Huo, Xiaoming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Yu, Chuanping
Huo, Xiaoming
description This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for the univariate variables, the computational complexity can be improved from \(O(m^2)\) to \(O(n m \cdot \mbox{log}(m))\), where \(n\) is the number of projection directions and \(m\) is the sample size. When \(n \ll m/\log(m)\), computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either \(2\) or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2313442902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2313442902</sourcerecordid><originalsourceid>FETCH-proquest_journals_23134429023</originalsourceid><addsrcrecordid>eNqNissKwjAQAIMgWLT_EPBcSDetj4MXX3gRBb2X0K40oSa1u_1_c_ADPA3MzEQkoHWebQqAmUiJnFIKVmsoS52I3a1n-zadvA_BYc02eJLWS25RHi2x8TVme0PYyAcbjsbW8b4it6GhhZi-TEeY_jgXy_Ppebhk_RA-IxJXLoyDj6kCneuigK0C_d_1BXLAN8w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2313442902</pqid></control><display><type>article</type><title>Optimal Projections in the Distance-Based Statistical Methods</title><source>Free E- Journals</source><creator>Yu, Chuanping ; Huo, Xiaoming</creator><creatorcontrib>Yu, Chuanping ; Huo, Xiaoming</creatorcontrib><description>This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for the univariate variables, the computational complexity can be improved from \(O(m^2)\) to \(O(n m \cdot \mbox{log}(m))\), where \(n\) is the number of projection directions and \(m\) is the sample size. When \(n \ll m/\log(m)\), computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either \(2\) or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computer simulation ; Multivariate analysis ; Optimization ; Projection ; Statistical methods</subject><ispartof>arXiv.org, 2019-11</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Yu, Chuanping</creatorcontrib><creatorcontrib>Huo, Xiaoming</creatorcontrib><title>Optimal Projections in the Distance-Based Statistical Methods</title><title>arXiv.org</title><description>This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for the univariate variables, the computational complexity can be improved from \(O(m^2)\) to \(O(n m \cdot \mbox{log}(m))\), where \(n\) is the number of projection directions and \(m\) is the sample size. When \(n \ll m/\log(m)\), computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either \(2\) or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.</description><subject>Algorithms</subject><subject>Computer simulation</subject><subject>Multivariate analysis</subject><subject>Optimization</subject><subject>Projection</subject><subject>Statistical methods</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNissKwjAQAIMgWLT_EPBcSDetj4MXX3gRBb2X0K40oSa1u_1_c_ADPA3MzEQkoHWebQqAmUiJnFIKVmsoS52I3a1n-zadvA_BYc02eJLWS25RHi2x8TVme0PYyAcbjsbW8b4it6GhhZi-TEeY_jgXy_Ppebhk_RA-IxJXLoyDj6kCneuigK0C_d_1BXLAN8w</recordid><startdate>20191107</startdate><enddate>20191107</enddate><creator>Yu, Chuanping</creator><creator>Huo, Xiaoming</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20191107</creationdate><title>Optimal Projections in the Distance-Based Statistical Methods</title><author>Yu, Chuanping ; Huo, Xiaoming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23134429023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Computer simulation</topic><topic>Multivariate analysis</topic><topic>Optimization</topic><topic>Projection</topic><topic>Statistical methods</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Chuanping</creatorcontrib><creatorcontrib>Huo, Xiaoming</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Chuanping</au><au>Huo, Xiaoming</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimal Projections in the Distance-Based Statistical Methods</atitle><jtitle>arXiv.org</jtitle><date>2019-11-07</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Székely [2016] for the univariate variables, the computational complexity can be improved from \(O(m^2)\) to \(O(n m \cdot \mbox{log}(m))\), where \(n\) is the number of projection directions and \(m\) is the sample size. When \(n \ll m/\log(m)\), computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either \(2\) or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2019-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2313442902
source Free E- Journals
subjects Algorithms
Computer simulation
Multivariate analysis
Optimization
Projection
Statistical methods
title Optimal Projections in the Distance-Based Statistical Methods
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T03%3A46%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimal%20Projections%20in%20the%20Distance-Based%20Statistical%20Methods&rft.jtitle=arXiv.org&rft.au=Yu,%20Chuanping&rft.date=2019-11-07&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2313442902%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2313442902&rft_id=info:pmid/&rfr_iscdi=true