A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering

Marriott (1971, Biometrics 27, 501-514) used a heuristic argument to derive the criterion g2|W| for determining the number of groups in a data set when the clustering objective function is the within-group determinant |W|. An analogous argument is employed to derive a criterion for use with the with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biometrics 1988-03, Vol.44 (1), p.23-34
Hauptverfasser: Krzanowski, W. J., Lai, Y. T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Marriott (1971, Biometrics 27, 501-514) used a heuristic argument to derive the criterion g2|W| for determining the number of groups in a data set when the clustering objective function is the within-group determinant |W|. An analogous argument is employed to derive a criterion for use with the within-group sum-of-squares objective function trace (W). The behaviour of both Marriott's criterion and the new criterion is investigated by Monte Carlo methods. For homogeneous data based on uniform and independent variables, the performance of the new criterion is close to expectation while Marriott's criterion shows much more extreme behaviour. For grouped data, the new criterion correctly identifies the number of groups in 85% of data sets under a wide range of conditions, while Marriott's criterion shows a success rate of less than 40%. The new criterion is illustrated on the well-known Iris data, and some cautionary comments are made about its use.
ISSN:0006-341X
1541-0420
DOI:10.2307/2531893