Defining a Metric Space of Host Logs and Operational Use Cases
Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs imp...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2018-11 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Verma, Miki E Bridges, Robert A |
description | Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2129790580</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2129790580</sourcerecordid><originalsourceid>FETCH-proquest_journals_21297905803</originalsourceid><addsrcrecordid>eNqNy7EOgjAQgOHGxESivMMlziTlKgKLC2oYNA7qTBo8SAlpsVfeXwcfwOlfvn8hIlQqTYod4krEzIOUEvc5ZpmKxOFInbHG9qDhSsGbFu6TbglcB7XjABfXM2j7gttEXgfjrB7hyQSVZuKNWHZ6ZIp_XYvt-fSo6mTy7j0Th2Zws_8u3GCKZV7KrJDqP_UBFKo3Aw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2129790580</pqid></control><display><type>article</type><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><source>Free E- Journals</source><creator>Verma, Miki E ; Bridges, Robert A</creator><creatorcontrib>Verma, Miki E ; Bridges, Robert A</creatorcontrib><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Analytics ; Clustering ; Cybersecurity ; Data mining ; Embedding ; Inspection ; Machine learning ; Metric space ; Sequences ; Similarity ; Streams ; Time series</subject><ispartof>arXiv.org, 2018-11</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><title>arXiv.org</title><description>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</description><subject>Algorithms</subject><subject>Analytics</subject><subject>Clustering</subject><subject>Cybersecurity</subject><subject>Data mining</subject><subject>Embedding</subject><subject>Inspection</subject><subject>Machine learning</subject><subject>Metric space</subject><subject>Sequences</subject><subject>Similarity</subject><subject>Streams</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNy7EOgjAQgOHGxESivMMlziTlKgKLC2oYNA7qTBo8SAlpsVfeXwcfwOlfvn8hIlQqTYod4krEzIOUEvc5ZpmKxOFInbHG9qDhSsGbFu6TbglcB7XjABfXM2j7gttEXgfjrB7hyQSVZuKNWHZ6ZIp_XYvt-fSo6mTy7j0Th2Zws_8u3GCKZV7KrJDqP_UBFKo3Aw</recordid><startdate>20181101</startdate><enddate>20181101</enddate><creator>Verma, Miki E</creator><creator>Bridges, Robert A</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20181101</creationdate><title>Defining a Metric Space of Host Logs and Operational Use Cases</title><author>Verma, Miki E ; Bridges, Robert A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21297905803</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Analytics</topic><topic>Clustering</topic><topic>Cybersecurity</topic><topic>Data mining</topic><topic>Embedding</topic><topic>Inspection</topic><topic>Machine learning</topic><topic>Metric space</topic><topic>Sequences</topic><topic>Similarity</topic><topic>Streams</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Verma, Miki E</creatorcontrib><creatorcontrib>Bridges, Robert A</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Verma, Miki E</au><au>Bridges, Robert A</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Defining a Metric Space of Host Logs and Operational Use Cases</atitle><jtitle>arXiv.org</jtitle><date>2018-11-01</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>Host logs, in particular, Windows Event Logs, are a valuable source of information often collected by security operation centers (SOCs). The semi-structured nature of host logs inhibits automated analytics, and while manual analysis is common, the sheer volume makes manual inspection of all logs impossible. Although many powerful algorithms for analyzing time-series and sequential data exist, utilization of such algorithms for most cyber security applications is either infeasible or requires tailored, research-intensive preparations. In particular, basic mathematic and algorithmic developments for providing a generalized, meaningful similarity metric on system logs is needed to bridge the gap between many existing sequential data mining methods and this currently available but under-utilized data source. In this paper, we provide a rigorous definition of a metric product space on Windows Event Logs, providing an embedding that allows for the application of established machine learning and time-series analysis methods. We then demonstrate the utility and flexibility of this embedding with multiple use-cases on real data: (1) comparing known infected to new host log streams for attack detection and forensics, (2) collapsing similar streams of logs into semantically-meaningful groups (by user, by role), thereby reducing the quantity of data but not the content, (3) clustering logs as well as short sequences of logs to identify and visualize user behaviors and background processes over time. Overall, we provide a metric space framework for general host logs and log sequences that respects semantic similarity and facilitates a wide variety of data science analytics to these logs without data-specific preparations for each.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2018-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2129790580 |
source | Free E- Journals |
subjects | Algorithms Analytics Clustering Cybersecurity Data mining Embedding Inspection Machine learning Metric space Sequences Similarity Streams Time series |
title | Defining a Metric Space of Host Logs and Operational Use Cases |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T21%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Defining%20a%20Metric%20Space%20of%20Host%20Logs%20and%20Operational%20Use%20Cases&rft.jtitle=arXiv.org&rft.au=Verma,%20Miki%20E&rft.date=2018-11-01&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2129790580%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2129790580&rft_id=info:pmid/&rfr_iscdi=true |