High Speed Error Log Control Method in In-memory Cluster Computing Platform

Since 2010, in-memory cluster computing platform has been increasingly used in firms and research institutions to analyze large amounts of datasets within a short amount of time. In these methods, unexpected errors cause the load to exceed the assumption for computer infrastructures such as a monito...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of Information Processing 2020, Vol.28, pp.310-319
Hauptverfasser:	Saito, Ryuichi, Haruyama, Shinichiro
Format:	Artikel
Sprache:	eng
Schlagworte:	Distributed System error logs k-means Spark TF-IDF
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Since 2010, in-memory cluster computing platform has been increasingly used in firms and research institutions to analyze large amounts of datasets within a short amount of time. In these methods, unexpected errors cause the load to exceed the assumption for computer infrastructures such as a monitoring system, owing to the execution of multithreading, assigning divided datasets to multiple nodes, and storing them in in-memory spaces. In this research, we propose a method that notifies administrators with only information needed to understand the situation in a short period by eliminating duplications of numerous application error logs for that period and clustering messages using an unsupervised learning k-means method with an in-memory cluster computing framework “Apache Spark.” By implementing this method, we can demonstrate that it is possible to eliminate duplications of error messages by 93% on an average compared with conventional methods. Further, we can extract significant messages from the application error messages and notify the administrators in an average of 4.2min from the time of occurrence of the error.
ISSN:	1882-6652 1882-6652
DOI:	10.2197/ipsjjip.28.310