Cornerstone network with feature extractor: a metric-based few-shot model for chinese natural sign language

StandardChinese natural sign language (CNSL) contains over 8,000 words. We consider dividing the task of CNSL recognition into multiple subtasks. Few-shot learning on subtasks can achieve minimal acquisition cost and short-term training. However, the existing few-shot learning methods do not take in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2021-10, Vol.51 (10), p.7139-7150
Hauptverfasser: Wang, Fei, Li, Chen, Zeng, Zhen, Xu, Ke, Cheng, Sirui, Liu, Yanjun, Sun, Shizhuo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:StandardChinese natural sign language (CNSL) contains over 8,000 words. We consider dividing the task of CNSL recognition into multiple subtasks. Few-shot learning on subtasks can achieve minimal acquisition cost and short-term training. However, the existing few-shot learning methods do not take into account the impact of ill-conditioned support samples, so we propose a new metric-based model, Cornerstone Network (CN), to complete the subtasks. CN is mainly composed of feature extractor (optional), embedding network and cornerstone generator. The cornerstone generator is designed as a semi-supervised clusterer. Compared with other metric-based few-shot models, CN without feature extractor improves 5-shot accuracy on Omniglot and miniImageNet. In order to verify the feasibility of our model on the task of CNSL recognition, we expanded the Chinese Natural Sign Language database, from CNSL-80 to CNSL-139, which integrates surface electromyography and inertial signals. The 5-shot accuracy on CNSL-139 increases from 65.25% to 68.83% comparing with the state-of-art model. After connecting with the 1-D convolution feature extractor using Siamese Network’s idea for secondary training, the accuracy increases by 10.38%. During the online test, the feature vector norms are used for selective matching. Although the accuracy drops, it is still at least 5% higher than that without feature extractor. Experimental results confirm the effectiveness of our model on 2-D images and 1-D time-series signals and the improvement of real-time recognition by SM.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-020-02170-9