A novel classification model of collective user web behaviour based on network traffic contents

Web behaviour analysis of a collective user has provided a powerful means for studying the collective user interests on the Internet. However, the existing research merely analyses the behaviour of a single user who accesses multiple applications or multiple users who access one application. The aut...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IET Networks 2021-07, Vol.10 (4), p.173-184
Hauptverfasser: Liu, Hongri, Wang, Shuo, Wei, Yuliang, Wang, Bailing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Web behaviour analysis of a collective user has provided a powerful means for studying the collective user interests on the Internet. However, the existing research merely analyses the behaviour of a single user who accesses multiple applications or multiple users who access one application. The authors propose a web behaviour classification model for collective user, in which the title fields in HTTP flows are extracted from the mirrored network traffic that has been already captured for any given period of time. The title fields, considered as the short of the whole web pages browsed by the users, are vectorized by natural language processing technologies. Specifically, the Latent Dirichlet allocation (LDA) algorithm is used to calculate the topic distribution probability matrix. Afterward, the multi‐class classifiers are trained and tested using the manually labelled probability distribution matrix from the output of the LDA algorithm to classify the user behaviour topics. The experiments demonstrate that the highest classification accuracy of the model reaches 81.2% by combing the LDA algorithm with Random Forest classifiers under the classification model.
ISSN:2047-4954
2047-4962
DOI:10.1049/ntw2.12010