Reliable and Distributed Network Monitoring via In-band Network Telemetry
Traditional network monitoring solutions usually lack of scalability due to their centralized nature collecting heartbeats from all network components via a single controller. As a solution, In-Band Network Telemetry (INT) framework has been recently proposed to collect network telemetry information...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Traditional network monitoring solutions usually lack of scalability due to
their centralized nature collecting heartbeats from all network components via
a single controller. As a solution, In-Band Network Telemetry (INT) framework
has been recently proposed to collect network telemetry information more
autonomously and distributedly by employing programmable switches. However, it
imposes further challenges to (i) find suitable INT paths to optimize the
control overhead and information freshness and (ii) ensure reliable delivery of
control information over multi-hop INT paths. In this work, we propose a
monitoring scheme, reliable Graph Partitioned INT (GPINT), by extending our
previous work and integrating shared queue ring (SQR) as a reliability feature
against potential failures in network telemetry collection due to network
congestion and link degradation that may cause loss of the visibility of the
network. We implement our proposal in a recent data plane programming language
P4, and compare it with traditional Simple Network Management Protocol (SNMP)
and also another state-of-the-art study employing Euler's method for INT path
generation. Our analysis first shows the importance of having a data recovery
mechanism against packet losses under different network conditions. Then, our
emulation results indicate that GPINT with reliability extension performs much
better than its opponent in terms of telemetry collection latency and overhead
monitoring scheme even under a high amount of packet losses. |
---|---|
DOI: | 10.48550/arxiv.2212.14876 |