Host-Based P2P Flow Identification and Use in Real-Time

Data identification and classification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P applications wishing to avoid identification, new strategies must be developed to detect and classify their flows. This...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on the web 2011-05, Vol.5 (2), p.1-27
Hauptverfasser: Hurley, John, Garcia-Palacios, Emi, Sezer, Sakir
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data identification and classification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P applications wishing to avoid identification, new strategies must be developed to detect and classify their flows. This article introduces a method of separating P2P and standard web traffic that can be applied as part of an offline data analysis process, based on the activity of the hosts on the network. Heuristics are analyzed and a classification system proposed that focuses on classifying those “long” flows that transfer most of the bytes across a network. The accuracy of the system is then tested using real network traffic from a core Internet router showing misclassification rates as low as 0.54% of flows in some cases. We expand on this proposed strategy to investigate its relevance to real-time, early classification problems. New proposals are made and the results of real-time experiments are compared to those obtained in the offline analysis. It is shown that classification accuracies in the real-time strategy are similar to those achieved in offline analysis with a large portion of the total web and P2P flows correctly identified.
ISSN:1559-1131
1559-114X
DOI:10.1145/1961659.1961661