Microblog hot topic data mining method
The invention relates to a microblog hot topic data mining method. The method comprises the following steps: firstly, collecting microblog hot topic data; preprocessing the data, deleting noise data,screening out feature words suitable for similarity calculation, and normalizing a corpus to facilita...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a microblog hot topic data mining method. The method comprises the following steps: firstly, collecting microblog hot topic data; preprocessing the data, deleting noise data,screening out feature words suitable for similarity calculation, and normalizing a corpus to facilitate data analysis; using a BTM topic model for modeling, performing continuous iteration on a preprocessing result, obtaining a probability distribution matrix of documents-topics and topic-words, and after text conversion is carried out on a result obtained after BTM model modeling analysis, expressing features of each piece of microblog data in a text vector mode; further optimizing and clustering the modeling result through a K-Means algorithm, the topic discovery effect being enhanced, and therefore topics with the good distinction degree are obtained.
本发明涉及微博热点话题数据挖掘方法,本发明首先采集微博热点话题数据;对数据进行预处理,删除噪音数据,筛选出适合相似度计算的特征词语并将语料库规范化以便进行数据分析;使用BTM主题模型进行建模,对预处理结果进行不断迭代,得到"文档-主题"和"主题-词语"的概率分布矩阵,将BTM模型建模分析后得到的结果经过文本转换后以文本向量 |
---|