Algorithm-Based Fault Tolerance Applied to P2P Computing Networks

P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in high performance computing systems. Our contribution is in two directions: first, instead of restricting to 2D checksums...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Roche, T., Cunche, M., Roch, J.-L.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	ABFT Checkpointing Computer networks Computer Science Cryptography and Security distributed computing Distributed, Parallel, and Cluster Computing Fault tolerance Fault tolerant systems Galois fields High performance computing Linear code linear coding P2P Parity check codes Peer to peer computing Reed-Solomon codes SUMMA
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in high performance computing systems. Our contribution is in two directions: first, instead of restricting to 2D checksums that tolerate only a small number of node failures, we propose to base disk-less checkpointing on linear codes to tolerate potentially a large number of faults. Then, we compare and analyse the use of low density parity check (LDPC) to classical Reed-Solomon (RS) codes with respect to different fault models to fit P2P systems. Our LDPC disk-less checkpointing method is well suited when only node disconnections are considered, but cannot deal with byzantine peers. Our RS disk-less checkpointing method tolerates such byzantine errors, but is restricted to exact finite field computations.
DOI:	10.1109/AP2PS.2009.30