Multi-scale latent space sequential fusion of images for pose estimation of underground tunneling machinery

Recently, the underground unmanned tunneling technology has received increasing attention, in which the pose estimation of the tunneling machinery occupies a prominent position. However, the autonomous positioning of the tunneling machinery presents significant challenges due to the harsh undergroun...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2025-02, Vol.141, p.109786, Article 109786
Hauptverfasser: Wu, Hongzhuang, Cheng, Cheng, Zhang, Deyi, Zhou, Hongzhi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, the underground unmanned tunneling technology has received increasing attention, in which the pose estimation of the tunneling machinery occupies a prominent position. However, the autonomous positioning of the tunneling machinery presents significant challenges due to the harsh underground environments and the extremely complicated working conditions, which severely restricts the development and application of underground intelligent tunneling technology. Therefore, this paper investigates a machine vision and deep learning based pose estimation method for the underground tunneling machinery. Considering that both effective features and interference features of time-series images exhibit spatiotemporal correlations across multiple scales, we propose a multi-scale latent space sequential fusion (MSLSSF) model to integrate information from time-sequence images and further estimate the pose variables of the tunneling machine. The multi-scale variational autoencoder (MSVAE) is employed in the proposed model to obtain multi-scale latent space representations for each image in the time sequence. The proposed MSLSSF strategy consists of two steps. The first step involves fusing the latent representations of the time-series images at each scale using long short-term memory (LSTM) models, as specific correlations may exist in the representations of the sequential images across various scales. The second step employs an attention mechanism to adaptively fuse the results from the first step across all scales and timesteps. The presented information fusion methods provide a way for the model to take full advantage of the multi-scale features of time-sequence images. Experiments are conducted on our custom-made tunneling machine visual localization dataset, where the proposed MSLSSF based pose estimation method outperforms the advanced comparison methods in both accuracy and robustness, thereby validating the efficacy of the proposed approach. Additionally, the feasibility of the MSLSSF strategy is confirmed through the visualization results of the proposed model.
ISSN:0952-1976
DOI:10.1016/j.engappai.2024.109786