Granformer: A granular transformer net with linear complexity

Recently, transformer models have demonstrated excellent performance across various intelligent applications owing to their ability to understand global context through self-attention mechanism. However, the extensively investigated multiplicative-based attention mechanism is inadequate for capturin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neurocomputing (Amsterdam) 2024-11, Vol.606, p.128380, Article 128380
Hauptverfasser:	Wang, Kaili, Sun, Xinwei, Shen, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Granular attention Intelligent applications Linear complexity Transformer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, transformer models have demonstrated excellent performance across various intelligent applications owing to their ability to understand global context through self-attention mechanism. However, the extensively investigated multiplicative-based attention mechanism is inadequate for capturing relationship-based feature representations, as the dot product cannot depict the intricate semantic information between objects. Moreover, it has a great computational burden with a complexity of O(n2), due to the global feature representation capability achieved by calculating the relationship between each token within the entire feature sequence. To solve the current problem, this paper proposes a granular transformer framework with linear complexity, wherein diverse granulation functions can be employed to supersede the prevailing multiplicative relationships, and an innovative linearization methodology in the form of matrix factorization is designed to reduce the computational burden. Relying on the intricate semantics information embedded within granular structures, the capacity for feature extraction is significantly more comprehensive. Then, a novel matrix factorization methodology is developed for the linearity of granulation-based attention, accomplished by implementing separate deformable convolution sampling and using an approximate iterative algorithm based on cubic equations to calculate the Moore–Penrose inverse. The mathematical proof that our method is approximate with the complete granulation-based attention matrix is investigated in detail. Finally, the performance of Granformer, an innovative reconfiguration of plug-and-play transformer block, is evaluated on representative intelligent applications, including 3D point cloud classification, emotion recognition and sentiment analysis, and object detection. The experimental results suggest that our methodologies outperform the state-of-the-art models.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2024.128380