Censored deep reinforcement patrolling with information criterion for monitoring large water resources using Autonomous Surface Vehicles
Monitoring and patrolling large water resources is a major challenge for nature conservation. The problem of acquiring data of an underlying environment that usually changes within time involves a proper formulation of the information. The use of Autonomous Surface Vehicles equipped with water quali...
Gespeichert in:
Veröffentlicht in: | Applied soft computing 2023-01, Vol.132, p.109874, Article 109874 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Monitoring and patrolling large water resources is a major challenge for nature conservation. The problem of acquiring data of an underlying environment that usually changes within time involves a proper formulation of the information. The use of Autonomous Surface Vehicles equipped with water quality sensor modules can serve as an early-warning system for contamination peak-detection, algae blooms monitoring, or oil-spill scenarios. In addition to information gathering, the vehicle must plan routes that are free of obstacles on non-convex static and dynamics maps. This work proposes a novel framework to obtain a collision-free policy using deterministic knowledge of the environment by means of a censoring operator and noisy networks that addresses the informative path planning with emphasis in temporal patrolling. Using information gain as a measure of the uncertainty reduction over data, it is proposed a Deep Q-Learning algorithm improved by a Q-Censoring mechanism for model-based obstacle avoidance. The obtained results demonstrate the effectiveness of the proposed algorithm for both cases in the Ypacaraí monitorization task. Simulations showed that the use of noisy-networks are a good choice for enhanced exploration, with 3 times less redundancy in the paths with respect to — greedy policy. Previous coverage strategies are also outperformed both in the accuracy of the obtained contamination model by a 13% on average and by a 37% in the detection of dangerous contamination peaks. Finally, the achieved results indicate the appropriateness of the proposed framework for monitoring scenarios with autonomous vehicles.
[Display omitted]
•A Reinforcement Learning method for Informative Patrolling using Maritime Vehicles.•A Censored Q-Function method for collisions for the Path Planning with obstacles.•A novel formulation of the temporal patrolling, agent observation, and reward.•A performance comparison between the novel approach and other method for validation. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2022.109874 |