Iterative reward shaping for non-overshooting altitude control of a wing-in-ground craft based on deep reinforcement learning

When a wing-in-ground craft (WIG) adjusts its flying altitude, overshooting behavior may occur, which weakens the safety and stealth ability. In previous studies on path following, cross-track error was used in company with other indicators to indirectly suppress overshoot. This paper proposes a met...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Robotics and autonomous systems 2023-05, Vol.163, p.104383, Article 104383
Hauptverfasser: Hu, Huan, Zhang, Guiyong, Ding, Lichao, Jiao, Kuikui, Zhang, Zhifan, Zhang, Ji
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When a wing-in-ground craft (WIG) adjusts its flying altitude, overshooting behavior may occur, which weakens the safety and stealth ability. In previous studies on path following, cross-track error was used in company with other indicators to indirectly suppress overshoot. This paper proposes a method for direct and gradual suppression of the overshoot via deep reinforcement learning (DRL), which iterates the reward function by introducing a partial one based on the current overshoot magnitude. Each time the overshoot is obtained by DRL, a function about this overshoot is added to the reward function for retraining. The function is defined as a type of cross-track error within a range of the current overshoot magnitude to the target altitude, and it counts the partial reward before the WIG gets the worse overshoot during training. The methodological feasibility is proved by mathematical reasoning, and an example of a virtual WIG changing the altitude is taken to validate the method. Assuming that the added partial function is in a basic 1-order fractional form of cross-track error and multiplied by a factor, the implementation of iterative reward shaping decreases overshoot to a minimal level, with the overshoot down to over 99.8% when compared to the initial one. Moreover, when introducing the partial reward function in the first iteration, influence of the factor on overshoot is analyzed. For a WIG’s adjustment of altitude, the method can monotonically reduce overshoot within tolerance. •Effectiveness. Each time the reward function is iterated, overshoot of the new trained trajectory will be suppressed.•Constraint. With more iterations, convergence of training will be more difficult to achieve under the same training scheme.•Characteristic. The degree of overshoot suppression is not related to the factor of the added partial reward function.
ISSN:0921-8890
1872-793X
DOI:10.1016/j.robot.2023.104383