PLDLS: A novel parallel label diffusion and label Selection-based community detection algorithm based on Spark in social networks

•A novel fast and accurate Spark-based parallel community detection algorithm is proposed.•The proposed PLDLS algorithm uses label diffusion of core nodes along with a new label selection method.•Multi-factor criteria for computing nodes importance is used to select core nodes.•A fast and parallel m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2021-11, Vol.183, p.115377, Article 115377
Hauptverfasser:	Roghani, Hamid, Bouyer, Asgarali, Nourani, Esmaeil
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Complexity Diffusion rate Iterative methods Label diffusion Label selection Labels Local similarity Nodes Parallel community detection Social networks Spark
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel fast and accurate Spark-based parallel community detection algorithm is proposed.•The proposed PLDLS algorithm uses label diffusion of core nodes along with a new label selection method.•Multi-factor criteria for computing nodes importance is used to select core nodes.•A fast and parallel merge phase is utilized to obtain more dense and accurate communities.•The result of PLDLS completely is robust, stable and scalable in comparison with other examined methods. Parallel and distributed community detection in large-scale complex networks, such as social networks, is a challenging task. Parallel and distributed algorithm with high accuracy and low computational complexity is one of the essential issues in the community detection field. In this paper, we propose a novel fast, and accurate Spark-based parallel label diffusion and label selection-based (PLDLS) community detection algorithm with two-step of label diffusion of core nodes along with a new label selection (propagation) method. We have used multi-factor criteria for computing node's importance and adopted a new method for selecting core nodes. In the first phase, utilizing the fact that nodes forming triangles, tend to be in the same community, parallel label diffusion of core nodes is performed to diffuse labels up to two levels. In the second phase, through an iterative and parallel process, the most appropriate labels are assigned to the remaining nodes. PLDLS proposes an improved robust version of LPA by putting aside randomness parameter tuning. Furthermore, we utilize a fast and parallel merge phase to get even more dense and accurate communities. Conducted experiments on real-world and artificial networks, indicates the better accuracy and low execution time of PLDLS in comparison with other examined methods.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2021.115377