Defining a Metric Space of Host Logs and Operational Use Cases
Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs imp...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Host logs, in particular, Windows Event Logs, are a valuable source of
information often collected by security operation centers (SOCs). The
semi-structured nature of host logs inhibits automated analytics, and while
manual analysis is common, the sheer volume makes manual inspection of all logs
impossible. Although many powerful algorithms for analyzing time-series and
sequential data exist, utilization of such algorithms for most cyber security
applications is either infeasible or requires tailored, research-intensive
preparations. In particular, basic mathematic and algorithmic developments for
providing a generalized, meaningful similarity metric on system logs is needed
to bridge the gap between many existing sequential data mining methods and this
currently available but under-utilized data source. In this paper, we provide a
rigorous definition of a metric product space on Windows Event Logs, providing
an embedding that allows for the application of established machine learning
and time-series analysis methods. We then demonstrate the utility and
flexibility of this embedding with multiple use-cases on real data: (1)
comparing known infected to new host log streams for attack detection and
forensics, (2) collapsing similar streams of logs into semantically-meaningful
groups (by user, by role), thereby reducing the quantity of data but not the
content, (3) clustering logs as well as short sequences of logs to identify and
visualize user behaviors and background processes over time. Overall, we
provide a metric space framework for general host logs and log sequences that
respects semantic similarity and facilitates a wide variety of data science
analytics to these logs without data-specific preparations for each. |
---|---|
DOI: | 10.48550/arxiv.1811.00591 |