Performance Guarantee of an Approximate Dynamic Programming Policy for Robotic Surveillance
This paper is focused on the development and analysis of suboptimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of incursions across the perimeter. whenever there is an incursion into...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on automation science and engineering 2016-04, Vol.13 (2), p.564-578 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper is focused on the development and analysis of suboptimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of incursions across the perimeter. whenever there is an incursion into the perimeter, an unattended ground sensor (UGS) in the vicinity, signals an alert. A robot services the alert by visiting the alert location, collecting information, e.g., photo and video imagery, and transmitting it to the operator. The accuracy of operator's classification depends on the volume and freshness of information gathered and provided by the robots at locations where incursions occur. There are two competing objectives for a robot: it needs to spend adequate time at an alert location to collect evidence for aiding the operator in accurate classification but it also needs to service other alerts as soon as possible, so that the evidence collected is relevant. The decision problem is to determine the optimal amount of time a robot must spend servicing an alert. The incursions are stochastic and their statistics are assumed to be known. This problem can be posed as a Markov Decision Problem. However, even for two robots and five UGS locations, the number of states is of the order of billions rendering exact dynamic programming methods intractable. Approximate dynamic programming (ADP) via linear programming (LP) provides a way to approximate the value function and derive suboptimal strategies. The novel feature of this paper is the derivation of a tractable lower bound via LP and the construction of a suboptimal policy whose performance improves upon the lower bound. An illustrative perimeter surveillance example corroborates the results derived in this paper. Note to Practitioners-In practice, one often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. This is true, in particular, for dynamic scheduling problems involving multiple robots/servers and queues of tasks that arrive in a stochastic fashion. The computation of value function, critical to the determination of optimal policies, is nearly impractical. Hence, one must settle for suboptimal policies. Two natural questions arise: (1) How does one construct a suboptimal policy? (2) How "good" is the constructed suboptimal policy? A common strategy to tackle the first problem is to approximate the value funct |
---|---|
ISSN: | 1545-5955 1558-3783 |
DOI: | 10.1109/TASE.2014.2366295 |