Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction

Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Secon...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computer science and technology 2023-09, Vol.38 (5), p.1036-1050
Hauptverfasser:	Tian, Ran, Diao, Zu-Long, Jiang, Hai-Yang, Xie, Gao-Gang
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Artificial Intelligence Cognition Cognition & reasoning Computational linguistics Computer Science Data Structures and Information Theory Datasets Information Systems Applications (incl.Internet) Language processing Lower bounds Natural language interfaces Parsers Regular Paper Software Engineering Theory of Computation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-021-1691-3