Recovering from distributable thread failures in distributed real-time Java

We consider the problem of recovering from the failures of distributable threads ("threads") in distributed real-time systems that operate under runtime uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread experiences a node...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on embedded computing systems 2010-08, Vol.10 (1)
Hauptverfasser: Curley, Edward, Ravindran, Binoy, Anderson, Jonathan, Jensen, E Douglas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We consider the problem of recovering from the failures of distributable threads ("threads") in distributed real-time systems that operate under runtime uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread experiences a node failure, the result is a broken thread having an orphan. Under a termination model, the orphans must be detected and aborted, and exceptions must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. Our application/scheduling model includes the proposed distributable thread programming model for the emerging Distributed Real-Time Specification for Java (DRTSJ), together with an exception-handler model. Threads are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of "best-effort" where higher importance threads are always favored over lower importance ones, irrespective of thread urgency as specified by their time constraints. We present a thread scheduling algorithm called HUA and a thread integrity protocol called TPR. We show that HUA and TPR bound the orphan cleanup and recovery time with bounded loss of the best-effort property. Our implementation experience for HUA/TPR in the Reference Implementation of the proposed programming model for the DRTSJ demonstrates the algorithm/protocol's effectiveness.
ISSN:1539-9087
DOI:10.1145/1167999.1168002