Assessing the Robustness of Cluster Solutions Obtained From Sparse Count Matrices

Psychological researchers often seek to obtain cluster solutions from sparse count matrices (e.g., social networks; counts of symptoms that are in common for 2 given individuals; structural brain imaging). Increasingly, community detection methods are being used to subset the data in a data-driven m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Psychological methods 2019-12, Vol.24 (6), p.675-689
Hauptverfasser:	Gates, Kathleen M., Fisher, Zachary F., Arizmendi, Cara, Henry, Teague R., Duffy, Kelly A., Mucha, Peter J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adult Cluster Analysis Data Interpretation, Statistical Humans Mathematical Modeling Monte Carlo Method Neuroimaging Psychology - methods Simulation Social Networking Social Networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Psychological researchers often seek to obtain cluster solutions from sparse count matrices (e.g., social networks; counts of symptoms that are in common for 2 given individuals; structural brain imaging). Increasingly, community detection methods are being used to subset the data in a data-driven manner. While many of these approaches perform well in simulation studies and thus offer some improvement upon traditional clustering approaches, there is no readily available approach for evaluating the robustness of these solutions in empirical data. Researchers have no way of knowing if their results are due to noise. We describe here 2 approaches novel to the field of psychology that enable evaluation of cluster solution robustness. This tutorial also explains the use of an associated R package, perturbR, which provides researchers with the ability to use the methods described herein. In the first approach, the cluster assignment from the original matrix is compared against cluster assignments obtained by randomly perturbing the edges in the matrix. Stable cluster solutions should not demonstrate large changes in the presence of small perturbations. For the second approach, Monte Carlo simulations of random matrices that have the same properties as the original matrix are generated. The distribution of quality scores ("modularity") obtained from the cluster solutions from these matrices are then compared with the score obtained from the original matrix results. From this, one can assess if the results are better than what would be expected by chance. perturbR automates these 2 methods, providing an easy-to-use resource for psychological researchers. We demonstrate the utility of this package using benchmark simulated data generated from a previous study and then apply the methods to publicly available empirical data obtained from social networks and structural neuroimaging. Translational Abstract Oftentimes researchers want to identify subsets of individuals who are connected or similar in some way. This might be socially, as when two people are connected on social media. Another reason to subset individuals is to identify those who are similar in some way. In either case, researchers use a method called "cluster analysis" to arrive at these subsets of individuals. The problem is that not all data has subsets. Here, we present a way for researchers to assess the degree to which their data have subsets.
ISSN:	1082-989X 1939-1463
DOI:	10.1037/met0000204