A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering
Marriott (1971, Biometrics 27, 501-514) used a heuristic argument to derive the criterion g2|W| for determining the number of groups in a data set when the clustering objective function is the within-group determinant |W|. An analogous argument is employed to derive a criterion for use with the with...
Gespeichert in:
Veröffentlicht in: | Biometrics 1988-03, Vol.44 (1), p.23-34 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Marriott (1971, Biometrics 27, 501-514) used a heuristic argument to derive the criterion g2|W| for determining the number of groups in a data set when the clustering objective function is the within-group determinant |W|. An analogous argument is employed to derive a criterion for use with the within-group sum-of-squares objective function trace (W). The behaviour of both Marriott's criterion and the new criterion is investigated by Monte Carlo methods. For homogeneous data based on uniform and independent variables, the performance of the new criterion is close to expectation while Marriott's criterion shows much more extreme behaviour. For grouped data, the new criterion correctly identifies the number of groups in 85% of data sets under a wide range of conditions, while Marriott's criterion shows a success rate of less than 40%. The new criterion is illustrated on the well-known Iris data, and some cautionary comments are made about its use. |
---|---|
ISSN: | 0006-341X 1541-0420 |
DOI: | 10.2307/2531893 |