Granformer: A granular transformer net with linear complexity
Recently, transformer models have demonstrated excellent performance across various intelligent applications owing to their ability to understand global context through self-attention mechanism. However, the extensively investigated multiplicative-based attention mechanism is inadequate for capturin...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2024-11, Vol.606, p.128380, Article 128380 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, transformer models have demonstrated excellent performance across various intelligent applications owing to their ability to understand global context through self-attention mechanism. However, the extensively investigated multiplicative-based attention mechanism is inadequate for capturing relationship-based feature representations, as the dot product cannot depict the intricate semantic information between objects. Moreover, it has a great computational burden with a complexity of O(n2), due to the global feature representation capability achieved by calculating the relationship between each token within the entire feature sequence. To solve the current problem, this paper proposes a granular transformer framework with linear complexity, wherein diverse granulation functions can be employed to supersede the prevailing multiplicative relationships, and an innovative linearization methodology in the form of matrix factorization is designed to reduce the computational burden. Relying on the intricate semantics information embedded within granular structures, the capacity for feature extraction is significantly more comprehensive. Then, a novel matrix factorization methodology is developed for the linearity of granulation-based attention, accomplished by implementing separate deformable convolution sampling and using an approximate iterative algorithm based on cubic equations to calculate the Moore–Penrose inverse. The mathematical proof that our method is approximate with the complete granulation-based attention matrix is investigated in detail. Finally, the performance of Granformer, an innovative reconfiguration of plug-and-play transformer block, is evaluated on representative intelligent applications, including 3D point cloud classification, emotion recognition and sentiment analysis, and object detection. The experimental results suggest that our methodologies outperform the state-of-the-art models. |
---|---|
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2024.128380 |