Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition
Facial expressions are the most direct external manifestation of personal emotions. Different from other pattern recognition problems, the feature difference between facial expressions is smaller. The general methods are difficult to effectively characterize the feature difference, or their paramete...
Gespeichert in:
Veröffentlicht in: | Journal of real-time image processing 2021-12, Vol.18 (6), p.2111-2122 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Facial expressions are the most direct external manifestation of personal emotions. Different from other pattern recognition problems, the feature difference between facial expressions is smaller. The general methods are difficult to effectively characterize the feature difference, or their parameters are too large to realize real-time processing. This paper proposes a lightweight mobile architecture and a multi-kernel feature facial expression recognition network, which can take into account the speed and accuracy of real-time facial expression recognition. First, a multi-kernel convolution block is designed by using three depthwise separable convolution kernels of different sizes in parallel. The small and the large kernels can extract local details and edge contour information of facial expressions, respectively. Then, the multi-channel information is fused to obtain multi-kernel enhancement features to better describe the differences between facial expressions. Second, a "Channel Split" operation is performed on the input of the multi-kernel convolution block, which can avoid repeated extraction of invalid information and reduce the amount of parameters to one-third of the original. Finally, a lightweight multi-kernel feature expression recognition network is designed by alternately using multi-kernel convolution blocks and depthwise separable convolutions to further improve the feature representation ability. Experimental results show that the proposed network achieves high accuracy of 73.3 and 99.5% on FER-2013 and CK + datasets, respectively. Furthermore, it achieves a speed of 78 frames per second on 640 × 480 video. It is superior to other state-of-the-art methods in terms of speed and accuracy. |
---|---|
ISSN: | 1861-8200 1861-8219 |
DOI: | 10.1007/s11554-021-01088-w |