Mobile app identification for encrypted network flows by traffic correlation

Mobile application (simply “app”) identification at a per-flow granularity is vital for traffic engineering, network management, and security practices. However, uncertainty is caused by a growing fraction of encrypted traffic such as Hypertext Transfer Protocol Secure. To address this challenge, we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of distributed sensor networks 2018-12, Vol.14 (12), p.155014771881729
Hauptverfasser: He, Gaofeng, Xu, Bingfeng, Zhang, Lu, Zhu, Haiting
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mobile application (simply “app”) identification at a per-flow granularity is vital for traffic engineering, network management, and security practices. However, uncertainty is caused by a growing fraction of encrypted traffic such as Hypertext Transfer Protocol Secure. To address this challenge, we have carefully analyzed mobile app traffic (mainly including Domain Name System, Hypertext Transfer Protocol, and encrypted traffic such as Secure Sockets Layer and Transport Layer Security) and observed that (1) the sets of server hostnames queried by different apps are distinguishable; (2) mobile apps may query multiple server hostnames simultaneously, that is, apps may send several Domain Name System lookups within a short time interval; and (3) the encrypted traffic may be similar to various other network flows generated by the same app. Based on these three observations, in this article, we propose a novel app identification methodology for encrypted network flows. To be specific, temporal, lexical, and metadata similarity are investigated to select correlated traffic and information retrieving techniques are adopted to identify apps. We ran a thorough set of experiments to assess the performance of the proposed approaches. The experimental results show that the identification accuracy can be as high as 95%, and the proposed methods have low storage requirements as well as fast training speeds.
ISSN:1550-1329
1550-1477
1550-1477
DOI:10.1177/1550147718817292