Document topic extraction method based on introduction of adaptive window into HDP model

The invention discloses a document topic extraction method based on introduction of an adaptive window into an HDP model. According to the method, an HDP model is combined with ADWIM, topic drift is detected through likelihood change, a window updating model is directly combined when topic drift is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZENG YE, LUO YU, CHANG JINPENG, PENG WANWAN, WU XIAOHUA
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a document topic extraction method based on introduction of an adaptive window into an HDP model. According to the method, an HDP model is combined with ADWIM, topic drift is detected through likelihood change, a window updating model is directly combined when topic drift is not found, and sub-windows are divided to re-judge whether window combination is carried out or notwhen topic drift occurs. According to the model, a document is divided into small document blocks by using a self-adaptive window, the sequence between words is ensured by moving the window, and meanwhile, a model training window is selected in a self-adaptive mode for division, so that any time slice and document block are prevented from being defined like most methods. 本发明公开了一种基于在HDP模型中引入自适应窗口的文档主题提取方法,将HDP模型与ADWIM相结合,通过似然变化检测主题漂移,在没有发现主题漂移时直接合并窗口更新模型,在主题发生漂移时划分子窗口重新判断是否进行窗口合并。该模型利用自适应窗口把文档划分为较小的文档块,通过移动窗口保证了词与词之间的顺序,同时通过自适应的方式来选取模型训练窗口划分,避免了像大多数方法那样定义任意的时间片和文档块。