Constructing better prototype generators with 3D CNNs for few-shot text classification
Prototypical network is a key algorithm to solve few-shot problems. Previous prototypical network based methods average sentence embeddings of the same class to obtain corresponding class representation.11The following concepts are semantically equivalent: prototype, proto, class representation, cat...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2023-09, Vol.225, p.120124, Article 120124 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Prototypical network is a key algorithm to solve few-shot problems. Previous prototypical network based methods average sentence embeddings of the same class to obtain corresponding class representation.11The following concepts are semantically equivalent: prototype, proto, class representation, category representation. However, this simple averaging fails to model the importance of word-level information to class representation effectively, thus limit the quality of prototype. In this work, we propose a 3D CNN22All the acronyms as well their explanations in this study are listed in Table 12. based 3D Convolution Prototypical Network (3DCPN) which is mainly composed by two parts. To focus more effectively on the importance of word-level information from prototype perspective, firstly, we use a 3D CNN to process word embeddings of the same class. 3D CNNs are skilled at capturing semantic correlation from multiple objects. We utilize 3D CNNs to replace averaging to generate better class representation. Secondly, we construct a 2D semantic mining layer as the second part in 3DCPN to extract deep feature from query embeddings. Symmetric model structure is designed to ensure feature matching between class representation and query representation. After that, we obtain the similarity between the prototype representation and the query representation by a metric function. According to the calculated similarity matrix, we introduce a temperature coefficient based cross entropy as the objective function to optimize our model. Extensive experiments are conducted on four benchmarks. The results show that our model outperforms LaSAML by 1.88% and 2.28% on Banking77 under 10-way-5-shot and 15-way-5-shot respectively. For the other baselines, 3DCPN achieves average improvements of 4.90%, 4.53% and 8.81% on Clinc150, Hwu64 and Liu57 respectively.
•A novel prototypical network (3DCPN) based on 3D convolution is proposed.•A 3D CNN is used to extract class information from embeddings of the same class.•The 3D CNN plays the role of a prototype generator in 3DCPN.•A 2D convolution based module is designed to generate high-quality query features.•Experiments on four benchmarks demonstrate the superiority of 3DCPN. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.120124 |