Evolutionary neural architecture search combining multi-branch ConvNet and improved transformer

Deep convolutional neural networks (CNNs) have achieved promising performance in the field of deep learning, but the manual design turns out to be very difficult due to the increasingly complex topologies of CNNs. Recently, neural architecture search (NAS) methods have been proposed to automatically...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2023-09, Vol.13 (1), p.15791-15791, Article 15791
Hauptverfasser: Xu, Yang, Ma, Yongjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deep convolutional neural networks (CNNs) have achieved promising performance in the field of deep learning, but the manual design turns out to be very difficult due to the increasingly complex topologies of CNNs. Recently, neural architecture search (NAS) methods have been proposed to automatically design network architectures, which are superior to handcrafted counterparts. Unfortunately, most current NAS methods suffer from either highly computational complexity of generated architectures or limitations in the flexibility of architecture design. To address above issues, this article proposes an evolutionary neural architecture search (ENAS) method based on improved Transformer and multi-branch ConvNet. The multi-branch block enriches the feature space and enhances the representational capacity of a network by combining paths with different complexities. Since convolution is inherently a local operation, a simple yet powerful “batch-free normalization Transformer Block” (BFNTBlock) is proposed to leverage both local information and long-range feature dependencies. In particular, the design of batch-free normalization (BFN) and batch normalization (BN) mixed in the BFNTBlock blocks the accumulation of estimation shift ascribe to the stack of BN, which has favorable effects for performance improvement. The proposed method achieves remarkable accuracies, 97.24 % and 80.06 % on CIFAR10 and CIFAR100, respectively, with high computational efficiency, i.e. only 1.46 and 1.53 GPU days. To validate the universality of our method in application scenarios, the proposed algorithm is verified on two real-world applications, including the GTSRB and NEU-CLS dataset, and achieves a better performance than common methods.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-023-42931-3