CNN-VAE: An intelligent text representation algorithm

Collecting and analyzing data from all devices to improve the efficiency of business processes is an important task of Industrial Internet of Things (IIoT). In the age of data explosion, extensive text data generated by the IIoT have given birth to a variety of text representation methods. The task...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2023-07, Vol.79 (11), p.12266-12291
Hauptverfasser:	Xu, Saijuan, Guo, Canyang, Zhu, Yuhan, Liu, Genggeng, Xiong, Neal
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Big Data Compilers Computer Science Industrial applications Internet of Things Interpreters Machine learning Natural language Natural language processing Normal distribution Processor Architectures Programming Languages Representations Semantics Support vector machines Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Collecting and analyzing data from all devices to improve the efficiency of business processes is an important task of Industrial Internet of Things (IIoT). In the age of data explosion, extensive text data generated by the IIoT have given birth to a variety of text representation methods. The task of text representation is to convert the natural language to a form that computer can understand with retaining the original semantics. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in natural language. Combining the advantages of convolutional neural network (CNN) and variational autoencoder (VAE), this paper proposes an intelligent CNN-VAE text representation algorithm as an advanced learning method for social big data within next-generation IIoT, which help users identify the information collected by sensors and perform further processing. This method employs the convolution layer to capture the local features of the context and uses the variational technique to reconstruct feature space to make it conform to the normal distribution. In addition, the improved word2vec model based on topical word embedding (TWE) is utilized to add topical information to word vectors to distinguish polysemy. This paper takes the social big data as an example to illustrate the way of the proposed algorithm applied in the next-generation IIoT and utilizes Cnews dataset to verify the performance of proposed method with four evaluating metrics (i.e., recall, accuracy, precision, and F1-score). Experimental results indicate that the proposed method outperforms word2vec-avg and CNN-AE in K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) classifiers and distinguishes polysemy effectively.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-023-05139-w