Dynamic Fault Tolerance for Multi-Node Query Processing

Parallel processing is a typical approach to answer analytical queries on large database. As the size of the database increases, we often try to increase the parallelism by incorporating more processing nodes. However, this approach increases the possibility of node failure as well. According to the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEICE Transactions on Information and Systems 2022/05/01, Vol.E105.D(5), pp.909-919
Hauptverfasser: BESSHO, Yutaro, HAYAMIZU, Yuto, GODA, Kazuo, KITSUREGAWA, Masaru
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Parallel processing is a typical approach to answer analytical queries on large database. As the size of the database increases, we often try to increase the parallelism by incorporating more processing nodes. However, this approach increases the possibility of node failure as well. According to the conventional practice, if a failure occurs during query processing, the database system restarts the query processing from the beginning. Such temporal cost may be unacceptable to the user. This paper proposes a fault-tolerant query processing mechanism, named PhoeniQ, for analytical parallel database systems. PhoeniQ continuously takes a checkpoint for every operator pipeline and replicates the output of each stateful operator among different processing nodes. If a single processing node fails during query processing, another can promptly take over the processing. Hence, PhoneniQ allows the database system to efficiently resume query processing after a partial failure event. This paper presents a key design of PhoeniQ and prototype-based experiments to demonstrate that PhoeniQ imposes negligible performance overhead and efficiently continues query processing in the face of node failure.
ISSN:0916-8532
1745-1361
DOI:10.1587/transinf.2021DAP0004