Research on topic discovery technology for Web news

With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space mode...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications 2020, Vol.32 (1), p.73-83
Hauptverfasser:	Xu, Guixian, Yu, Ziheng, Wang, Changzhi, Wang, Antai
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Clustering Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Computer Science, Artificial Intelligence Data mining Data Mining and Knowledge Discovery Image Processing and Computer Vision Indexing Information dissemination Information retrieval Mail order News Probability and Statistics in Computer Science S.I. : Brain- Inspired computing and Machine learning for Brain Health Science & Technology Semantic analysis Semantics Service introduction Technology Texts Webs Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space model, but it has the defects such as high dimension and data sparsity. However, the latent semantic analysis can map the high-dimensional and sparse words to k-dimensional semantic space and improve the similarity of the news of the same topic by the semantic correlation between words. In this paper, Web news topic discovery is studied. First, the set of Web news text is vectored and the weight of each feature in the texts is calculated by improved TFIDF. After the original text vector set is analysed by latent semantic analysis, the semantic relation is fully exploited between the texts and the words, and the news topics are extracted by clustering approach. For the extraction of sub-topics, the co-occurrence of words is used to display the sub-topics. In essence, the sub-topic vector is established through these co-occurrence words. The experimental results show that the proposed method can effectively capture the current hot topics of Web news and related sub-topics. It is meaningful for the technology of information retrieval and data mining.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-018-3744-2