A theoretical and empirical exploration of TileTrans for effective tile pruning
In this paper, we propose a reparameterization method that is capable of transforming the attention layer of deep neural networks (DNNs) for reducing the loss of tile pruning. The proposed method can effectively accelerate the inference of DNNs by augmenting the effects of tile pruning, which achiev...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2024-10, Vol.301, p.112359, Article 112359 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we propose a reparameterization method that is capable of transforming the attention layer of deep neural networks (DNNs) for reducing the loss of tile pruning. The proposed method can effectively accelerate the inference of DNNs by augmenting the effects of tile pruning, which achieves a better tradeoff between the pruning loss and acceleration than other structured pruning methods. To realize this, we release our previous heuristic method, TileTrans, from its previous restriction to linear and convolutional layers. We also provide a robust mathematical proof, which not only gives a rigorous theoretical basis for the heuristic algorithm but also underscores its consistent effectiveness in reducing tile pruning loss. This proof presumes a normal distribution of the weight elements in the pre-trained model, as commonly assumed in contemporary research. When evaluated on the question-answering natural language inference (QNLI) task, TileTrans improved the accuracy of the pruned BERT-Base model by up to 5.7%. Our findings also empirically indicate that the best pruning configurations of convolutional neural networks (CNNs) differ from those of transformer-based models. Specifically, the accuracies of CNNs and transformer-based models are most effectively improved by one-shot pruning and iterative pruning, respectively. These results are noteworthy because both pruning methods are widely applied to DNNs. They also highlight the importance of reducing the tile pruning loss on the attention layer, along with the losses on the linear and convolutional layers.
•We reparameterize the attention layer to reduce the tile pruning loss.•Sorting weight matrix rows by row importance reduces the tile pruning loss.•Reparameterization methods must cooperate with the correct pruning setting. |
---|---|
ISSN: | 0950-7051 |
DOI: | 10.1016/j.knosys.2024.112359 |