TSCMDL: Multimodal Deep Learning Framework for Classifying Tree Species Using Fusion of 2-D and 3-D Features

Accurate tree species information is a prerequisite for forest resource management. Combining light detection and ranging (LiDAR) and image data is one main method of tree species classification. Traditional machine learning methods rely on expert knowledge to calculate a large number of feature par...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2023, Vol.61, p.1-11
Hauptverfasser: Liu, Bingjie, Hao, Yuanshuo, Huang, Huaguo, Chen, Shuxin, Li, Zengyuan, Chen, Erxue, Tian, Xin, Ren, Min
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accurate tree species information is a prerequisite for forest resource management. Combining light detection and ranging (LiDAR) and image data is one main method of tree species classification. Traditional machine learning methods rely on expert knowledge to calculate a large number of feature parameters. Deep learning technology can directly use the original image and point cloud data to classify tree species. However, data with different patterns require the use of different types of deep learning methods. In this study, a tree species classification multimodal deep learning (TSCMDL) that fuses 2-D and 3-D features was constructed and then used to combine data from multiple sources for tree species classification. This framework uses an improved version of the PointMLP model as its backbone network and uses ResNet50 and PointMLP networks to extract the image features and point cloud features, respectively. The proposed framework was tested using unmanned aerial vehicle LiDAR (UAV LiDAR) data and red, green, blue (RGB) orthophotos. The results showed that the accuracy of the tree species classification using the TSCMDL framework was 98.52%, which was 4.02% higher than that based on point cloud features only. In addition, when the same hyperparameters were used for training the model, the efficiency of the model training was not significantly lower than for models based on point cloud features only. The proposed multimodal deep learning framework extracts features directly from the original data and integrates them effectively, thus avoiding manual feature screening and achieving more accurate classification. The feature extraction network used in the TSCMDL framework can be replaced by other suitable frameworks and has strong application potential.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2023.3266057