An Approximate Maximin-Directed Random Sampling for Clustering Applications

The Maximin-Directed Random Sampling (MMDRS) algorithm, a cornerstone of numerous visual assessment techniques and scalable single linkage clustering, is recognized for its unique three-part structure: (i) Maximin (MM) sampling for prototype identification; (ii) nearest prototype partition construct...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Revue d'Intelligence Artificielle 2023-12, Vol.37 (6), p.1415-1421
Hauptverfasser:	Hasan, Khamees Khalaf, Ibrahim, Omar A., Dham, Mahmood Ali A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Big Data Cluster analysis Clustering Complexity Datasets Feasibility studies Methods Prototypes Random sampling Sampling methods Social networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Maximin-Directed Random Sampling (MMDRS) algorithm, a cornerstone of numerous visual assessment techniques and scalable single linkage clustering, is recognized for its unique three-part structure: (i) Maximin (MM) sampling for prototype identification; (ii) nearest prototype partition construction via maximin samples; and (iii) directed random sampling from partition subsets. Despite its diverse applications, the computational complexity of MMDRS presents significant challenges. In response to this issue, an approximate form of the MMDRS algorithm (AMMDRS) is proposed in this study, aiming to alleviate time complexity. Through experimental investigation, comparisons are drawn between the directed random sampling methods, assessing whether significant differences exist in the samples produced and evaluating the superiority of either method over simple random sampling. The results of this empirical study demonstrate that AMMDRS outperforms MMDRS in terms of speed across all datasets, without any compromise on sampling accuracy. This finding underscores the critical importance of such a method in big data applications, where the feasibility of processing the entire dataset is often limited. The study's revelations emphasize that undirected random sampling achieves more authentic representations of parent distributions than MM samples alone, thereby maximizing the diversity and representativeness of selected points within the feature space. Overall, this study introduces a promising avenue for enhancing the efficiency of MMDRS, opening the door to its broader application in data-intensive domains.
ISSN:	0992-499X 1958-5748
DOI:	10.18280/ria.370605