Finding Malicious Cyber Discussions in Social Media
AbstractSecurity analysts gather essential information oncyber attacks, exploits, vulnerabilities, and victimsby manually searching social media sites. This effortcan be dramatically reduced using natural languagemachine learning techniques. Using a newEnglish text corpus containing more than 250k d...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Report |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | AbstractSecurity analysts gather essential information oncyber attacks, exploits, vulnerabilities, and victimsby manually searching social media sites. This effortcan be dramatically reduced using natural languagemachine learning techniques. Using a newEnglish text corpus containing more than 250k discussionsfrom Stack Exchange, Reddit, and Twitteron cyber and non-cyber topics, we demonstrate theability to detect more than 90% of the cyber discussionswith fewer than 1% false alarms. If an originalsearched document corpus includes only 5%cyber documents, then our processing provides anenriched corpus for analysts where 83% to 95% ofthe documents are on cyber topics. Good performancewas obtained using TF-IDF features and logisticregression. A classifier trained using priorhistorical data accurately detected 86% of emergentHeartbleed discussions and retrospective experimentsdemonstrate that classifier performanceremains stable up to a year without retraining. |
---|