Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering

Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. Thes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2015-11, Vol.5 (1), p.16361-16361, Article 16361
Hauptverfasser: Gaiteri, Chris, Chen, Mingming, Szymanski, Boleslaw, Kuzmin, Konstantin, Xie, Jierui, Lee, Changkyu, Blanche, Timothy, Chaibub Neto, Elias, Huang, Su-Chun, Grabowski, Thomas, Madhyastha, Tara, Komashko, Vitalina
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.
ISSN:2045-2322
2045-2322
DOI:10.1038/srep16361