A Protocol for Reconciling Recovery and High-Availability in Replicated Databases

We describe a recovery protocol which boosts availability, fault tolerance and performance by enabling failed network nodes to resume an active role immediately after they start recovering. The protocol is designed to work in tandem with middleware-based eager update-everywhere strategies and relate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Armendáriz-Iñigo, J. E., Muñoz-Escoí, F. D., Decker, H., Juárez-Rodríguez, J. R., de Mendívil, J. R. González
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Alive Node Applied sciences Computer science control theory systems Exact sciences and technology Local Transaction Recovery Protocol Snapshot Isolation View Change
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We describe a recovery protocol which boosts availability, fault tolerance and performance by enabling failed network nodes to resume an active role immediately after they start recovering. The protocol is designed to work in tandem with middleware-based eager update-everywhere strategies and related group communication systems. The latter provide view synchrony, i.e., knowledge about currently reachable nodes and about the status of messages delivered by faulty and alive nodes. That enables a fast replay of missed updates which defines dynamic database recovery partition. Thus, speeding up the recovery of failed nodes which, together with the rest of the network, may seamlessly continue to process transactions even before their recovery has completed. We specify the protocol in terms of the procedures executed with every message and event of interest and outline a correctness proof.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11902140_67