Predicting Query Duplication with Box-Jenkins Models and Its Applications

Many previous works of Peer-to-Peer traffic characterization and modeling focused their attention on the distribution of query contents. However, few has been done towards a better understanding of the time series distribution of these queries, which is vital for system performance. To remedy this s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xinyao Hu, Shicong Meng, Cong Shi, Dingyi Han, Yong Yu
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many previous works of Peer-to-Peer traffic characterization and modeling focused their attention on the distribution of query contents. However, few has been done towards a better understanding of the time series distribution of these queries, which is vital for system performance. To remedy this situation, this paper characterizes query traffic by using automatic time series analysis to evaluate different linear models(Box-Jenkins models and some simple windowed-mean models) for predicting the number of duplicated queries from 10 minutes to 2 hours into the future. Both the predictive power and the computational costs of these models are evaluated over 318,942,450 real world Gnutella queries collected over 3 months. We find the number of duplicated queries is consistently predictable. Simple, practical models like AR perform well on prediction. To show that these characteristics have a wide range of potential applications, we propose two enhancement to existing search results caching and load balancing algorithms. Our simulation study shows that our methodology works quite well in both scenarios in terms of efficiency and effectiveness. The main contribution of this paper lies in: (1) proposing new measurement techniques on Gnutella, (2) characterizing and modeling peer-to-peer query traffic with Box-Jenkins Models, (3) presenting a general enhancement to existing performance optimization algorithm in P2P systems.
ISSN:2161-3559
DOI:10.1109/P2P.2007.21