An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm

Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause tra...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2021-06, Vol.51 (6), p.3405-3420
Hauptverfasser:	Luo, Wentao, Zhang, Jianfu, Feng, Pingfa, Liu, Haochen, Yu, Dingwen, Wu, Zhijun
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Assembly Computer Science Computer Science, Artificial Intelligence Efficiency Machine learning Machines Manufacturing Mechanical Engineering Model accuracy Optimization Probability theory Processes Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause traditional RL methods to behave poorly in terms of their efficiency and accuracy. In this paper, a model-driven adaptive proximal proximity optimization (MAPPO) method was presented to make the assembly system autonomously rectify the bolt posture error. In the MAPPO method, a probabilistic tree and adaptive reward mechanism were used to improve the calculation efficiency and accuracy of the traditional PPO method. The size of the action space was reduced by establishing a hierarchical logical relationship for each parameter with a probabilistic tree. Based on an adaptive reward mechanism, the phenomenon that the algorithm easily falls into local minima could be improved. Finally, the proposed method was verified based on the Unity simulation engine. The advancement and robustness of the proposed model were also validated by comparing different cases in simulations and experiments. The results revealed that MAPPO has better algorithm efficiency and accuracy compared with other state-of-the-art algorithms.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01906-x