The effect of activation functions on accuracy, convergence speed, and misclassification confidence in CNN text classification: a comprehensive exploration

Convolutional neural networks (CNNs) have become a useful tool for a wide range of applications such as text classification. However, CNNs are not always sufficiently accurate to be useful in certain applications. The selection of activation functions within CNN architecture can affect the efficacy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024, Vol.80 (1), p.292-312
Hauptverfasser:	Emanuel, Rebecca H. K., Docherty, Paul D., Lunt, Helen, Möller, Knut
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Classification Compilers Computer Science Convergence Datasets Hyperbolic functions Interpreters Processor Architectures Programming Languages Text categorization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Convolutional neural networks (CNNs) have become a useful tool for a wide range of applications such as text classification. However, CNNs are not always sufficiently accurate to be useful in certain applications. The selection of activation functions within CNN architecture can affect the efficacy of the CNN. However, there is limited research regarding which activation functions are best for CNN text classification. This study tested sixteen activation functions across three text classification datasets and six CNN structures, to determine the effects of activation function on accuracy, iterations to convergence, and Positive Confidence Difference (PCD). PCD is a novel metric introduced to compare how activation functions affected a network’s classification confidence. Tables were presented to compare the performance of the activation functions across the different CNN architectures and datasets. Top performing activation functions across the different tests included the symmetrical multi-state activation function, sigmoid, penalised hyperbolic tangent, and generalised swish. An activation function’s PCD was the most consistent evaluation metric during activation function assessment, implying a close relationship between activation functions and network confidence that has yet to be explored.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-023-05441-7