Defining a Metric Space of Host Logs and Operational Use Cases

Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs imp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Verma, Miki E, Bridges, Robert A
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Verma, Miki E
Bridges, Robert A
description Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.
doi_str_mv 10.48550/arxiv.1811.00591
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1811_00591</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1811_00591</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-76319c42f7c6b2325d66cdbe86c4431d2bd20ec7f093ae88625fbef117fd8e7e3</originalsourceid><addsrcrecordid>eNotz8tKxDAUgOFsXMjoA7gyL9CakzSXbgSplxE6zMJxXU6TkyEwtiUpom8vjq7-3Q8fYzcg6sZpLe4wf6XPGhxALYRu4ZLdP1JMU5qOHPmO1pw8f1vQE58j385l5f18LBynwPcLZVzTPOGJvxfiHRYqV-wi4qnQ9X837PD8dOi2Vb9_ee0e-gqNhcoaBa1vZLTejFJJHYzxYSRnfNMoCHIMUpC3UbQKyTkjdRwpAtgYHFlSG3b7tz0DhiWnD8zfwy9kOEPUD5SWQhE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><source>arXiv.org</source><creator>Verma, Miki E ; Bridges, Robert A</creator><creatorcontrib>Verma, Miki E ; Bridges, Robert A</creatorcontrib><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><identifier>DOI: 10.48550/arxiv.1811.00591</identifier><language>eng</language><subject>Computer Science - Cryptography and Security ; Statistics - Applications</subject><creationdate>2018-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1811.00591$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1811.00591$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><subject>Computer Science - Cryptography and Security</subject><subject>Statistics - Applications</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tKxDAUgOFsXMjoA7gyL9CakzSXbgSplxE6zMJxXU6TkyEwtiUpom8vjq7-3Q8fYzcg6sZpLe4wf6XPGhxALYRu4ZLdP1JMU5qOHPmO1pw8f1vQE58j385l5f18LBynwPcLZVzTPOGJvxfiHRYqV-wi4qnQ9X837PD8dOi2Vb9_ee0e-gqNhcoaBa1vZLTejFJJHYzxYSRnfNMoCHIMUpC3UbQKyTkjdRwpAtgYHFlSG3b7tz0DhiWnD8zfwy9kOEPUD5SWQhE</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Verma, Miki E</creator><creator>Bridges, Robert A</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20181101</creationdate><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><author>Verma, Miki E ; Bridges, Robert A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-76319c42f7c6b2325d66cdbe86c4431d2bd20ec7f093ae88625fbef117fd8e7e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Statistics - Applications</topic><toplevel>online_resources</toplevel><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Verma, Miki E</au><au>Bridges, Robert A</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Defining a Metric Space of Host Logs and Operational Use Cases</atitle><date>2018-11-01</date><risdate>2018</risdate><abstract>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</abstract><doi>10.48550/arxiv.1811.00591</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1811.00591
ispartof
issn
language eng
recordid cdi_arxiv_primary_1811_00591
source arXiv.org
subjects Computer Science - Cryptography and Security
Statistics - Applications
title Defining a Metric Space of Host Logs and Operational Use Cases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T20%3A42%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Defining%20a%20Metric%20Space%20of%20Host%20Logs%20and%20Operational%20Use%20Cases&rft.au=Verma,%20Miki%20E&rft.date=2018-11-01&rft_id=info:doi/10.48550/arxiv.1811.00591&rft_dat=%3Carxiv_GOX%3E1811_00591%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true