Defining a Metric Space of Host Logs and Operational Use Cases

Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs imp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2018-11
Hauptverfasser: Verma, Miki E, Bridges, Robert A
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Verma, Miki E
Bridges, Robert A
description Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2129790580</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2129790580</sourcerecordid><originalsourceid>FETCH-proquest_journals_21297905803</originalsourceid><addsrcrecordid>eNqNy7EOgjAQgOHGxESivMMlziTlKgKLC2oYNA7qTBo8SAlpsVfeXwcfwOlfvn8hIlQqTYod4krEzIOUEvc5ZpmKxOFInbHG9qDhSsGbFu6TbglcB7XjABfXM2j7gttEXgfjrB7hyQSVZuKNWHZ6ZIp_XYvt-fSo6mTy7j0Th2Zws_8u3GCKZV7KrJDqP_UBFKo3Aw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2129790580</pqid></control><display><type>article</type><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><source>Free E- Journals</source><creator>Verma, Miki E ; Bridges, Robert A</creator><creatorcontrib>Verma, Miki E ; Bridges, Robert A</creatorcontrib><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Analytics ; Clustering ; Cybersecurity ; Data mining ; Embedding ; Inspection ; Machine learning ; Metric space ; Sequences ; Similarity ; Streams ; Time series</subject><ispartof>arXiv.org, 2018-11</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><title>arXiv.org</title><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><subject>Algorithms</subject><subject>Analytics</subject><subject>Clustering</subject><subject>Cybersecurity</subject><subject>Data mining</subject><subject>Embedding</subject><subject>Inspection</subject><subject>Machine learning</subject><subject>Metric space</subject><subject>Sequences</subject><subject>Similarity</subject><subject>Streams</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNy7EOgjAQgOHGxESivMMlziTlKgKLC2oYNA7qTBo8SAlpsVfeXwcfwOlfvn8hIlQqTYod4krEzIOUEvc5ZpmKxOFInbHG9qDhSsGbFu6TbglcB7XjABfXM2j7gttEXgfjrB7hyQSVZuKNWHZ6ZIp_XYvt-fSo6mTy7j0Th2Zws_8u3GCKZV7KrJDqP_UBFKo3Aw</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Verma, Miki E</creator><creator>Bridges, Robert A</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20181101</creationdate><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><author>Verma, Miki E ; Bridges, Robert A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21297905803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Analytics</topic><topic>Clustering</topic><topic>Cybersecurity</topic><topic>Data mining</topic><topic>Embedding</topic><topic>Inspection</topic><topic>Machine learning</topic><topic>Metric space</topic><topic>Sequences</topic><topic>Similarity</topic><topic>Streams</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Verma, Miki E</au><au>Bridges, Robert A</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Defining a Metric Space of Host Logs and Operational Use Cases</atitle><jtitle>arXiv.org</jtitle><date>2018-11-01</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2018-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2129790580
source Free E- Journals
subjects Algorithms
Analytics
Clustering
Cybersecurity
Data mining
Embedding
Inspection
Machine learning
Metric space
Sequences
Similarity
Streams
Time series
title Defining a Metric Space of Host Logs and Operational Use Cases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T21%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Defining%20a%20Metric%20Space%20of%20Host%20Logs%20and%20Operational%20Use%20Cases&rft.jtitle=arXiv.org&rft.au=Verma,%20Miki%20E&rft.date=2018-11-01&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2129790580%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2129790580&rft_id=info:pmid/&rfr_iscdi=true