An Automatic Clustering Algorithm and Its Properties in High-Dimensional Spaces

An economical technique for approximating a joint N-dimensional probability density function has been described by Sebestyen and Edie [20]. The algorithm searches for clusters of points and considers each cluster as one hyperellipsoidal cell in an N-dimensional histogram. Among the advantages of thi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on systems, man, and cybernetics man, and cybernetics, 1972-04, Vol.SMC-2 (2), p.247-254
Hauptverfasser: Mucciardi, Anthony N., Gose, Earl E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An economical technique for approximating a joint N-dimensional probability density function has been described by Sebestyen and Edie [20]. The algorithm searches for clusters of points and considers each cluster as one hyperellipsoidal cell in an N-dimensional histogram. Among the advantages of this scheme are: 1) the histogram cell descriptors-location, shape, and size-can be determined adaptively from sequentially introduced data samples of known classification and, 2) the number of cells required for a good fit can usually be held to a small number. No assumptions are required about the underlying statistical structure of the data. The algorithm requires three types of "control parameters" which critically affect its performance and are dependent upon the number of dimensions. The three factors control the birth, shape, and growth rate of the cells. Guides were presented in [20] for choosing the control parameter values. These guides functioned well for spaces of 3 dimensions or less, but did not yield usable values for spaces of greater dimensionality. This paper presents heuristics which were developed to automate the selection of the control parameters. The properties of these parameters were studied as a function of dimension. Two of the control parameters were found to be linearly related to dimension. This provides a method for determining their value by extrapolation, thereby avoiding a great deal of computation.
ISSN:0018-9472
2168-2909
DOI:10.1109/TSMC.1972.4309100