A fast and distributed C4.5 algorithm for urban big data

The amount of information nowadays is rapidly growing. Aside from valuable information, information that is unrelated to a target or is meaningless is also growing. Big data and broader digital technologies are considered the primary components of smart city governance and planning. Big data analysi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Intelligent data analysis 2023-10, Vol.27 (5), p.1379-1408
Hauptverfasser: Cheng, Wan-Shu, Huang, Peng-Yu, Huang, Jheng-Yu, Chen, Ju-Chin, Lin, Kawuu W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The amount of information nowadays is rapidly growing. Aside from valuable information, information that is unrelated to a target or is meaningless is also growing. Big data and broader digital technologies are considered the primary components of smart city governance and planning. Big data analysis is considered to define a new era in urban planning, research, and policy. Effective data mining and pattern detection techniques are becoming very important these days. Processing such a large amount of data entails the use of data mining, a technique that clarifies the association between valid information and excludes irrelevant data to implement a practical decision tree. A large amount of data affects processing time and I/O costs during data mining. This study proposes to distribute data among multiple clients and distribute a large amount of data computation equally to improve the resource cost problem of exploration. Following that, the main server consolidates the computation results and generates the survey results. Experiment results show that the proposed algorithm is superior, thus allowing a larger amount of data to be processed while producing high-quality results.
ISSN:1088-467X
1571-4128
DOI:10.3233/IDA-220753