Fast and Memory-Efficient Approximate Minimum Spanning Tree Generation for Large Datasets

Conventional minimum spanning tree (MST) algorithms typically start by creating a distance matrix of the n ( n - 1 ) / 2 pairs of data points, leading to a time complexity of O ( n 2 ) . This initial step poses a computational bottleneck. To overcome this limitation, we present a novel method that c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Arabian journal for science and engineering (2011) 2025, Vol.50 (2), p.1233-1246
Hauptverfasser:	Almansoori, Mahmood K. M., Meszaros, Andras, Telek, Miklos
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data points Datasets Engineering Graph theory Humanities and Social Sciences multidisciplinary Research Article-Computer Engineering and Computer Science Science Spatial data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Conventional minimum spanning tree (MST) algorithms typically start by creating a distance matrix of the n ( n - 1 ) / 2 pairs of data points, leading to a time complexity of O ( n 2 ) . This initial step poses a computational bottleneck. To overcome this limitation, we present a novel method that constructs an initial random k -neighbor graph and optimizes this graph by employing a crawling technique to efficiently approximate the k Nearest Neighbors ( k NN) graph. This crawling approach allows us to approximate the closest neighbors of each node. Subsequently, the approximate k NN graph is utilized to build an initial approximate MST and iteratively refine it by the same crawling process. Using this approach, an approximate MST can be obtained for a data set of size n with empirical cost around O ( n 1.07 ) and a minimal O ( n ) memory consumption. In contrast to spatial tree-based approaches, the presented method also scales well to high dimensional data. We have shown that the proposed approach achieves such a level of performance with only a marginal accuracy reduction between 0.5% and 6%. These qualities make it an excellent candidate for approximate MST calculation for high-dimensional, large data sets.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-024-08974-y