NODE FAILURE DETECTION AND RESOLUTION IN DISTRIBUTED DATABASES

Methods and systems to detect and resolve failure in a distributed database system is described herein. A first node in the distributed database system can detect an interruption in communication with at least one other node in the distributed database system. This indicates a network failure. In re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: BODAGALA SREENATH, SHAULL ROSS, SMITH PAUL DAVID
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods and systems to detect and resolve failure in a distributed database system is described herein. A first node in the distributed database system can detect an interruption in communication with at least one other node in the distributed database system. This indicates a network failure. In response to detection of this failure, the first node starts a failure resolution protocol. This invokes coordinated broadcasts of respective lists of suspicious nodes among neighbor nodes. Each node compares its own list of suspicious nodes with its neighbors' lists of suspicious nodes to determine which nodes are still directly connected to each other. Each node determines the largest group of these directly connected nodes and whether or not it is in that group. If a node isn't in that group, it fails itself to resolve the network failure. 本文描述了用于检测和解决分布式数据库系统中的故障的方法和系统。分布式数据库系统中的第一节点可以检测与分布式数据库系统中的至少一个其他节点的通信中断。这指示网络故障。响应于检测到该故障,第一节点开始故障解决协议。这调用邻居节点之间的相应可疑节点列表的协调广播。每个节点将其自己的可疑节点列表与其邻居的可疑节点列表进行比较,以确定哪些节点仍直接相互连接。每个