Similar text recognizing processing method, device and system

The embodiment of the invention discloses a similar text recognizing processing method, device and system. The method includes the steps of generating feature data of to-be-processed texts, segmentingthe feature data to generate a group of segmented feature data, integrating the to-be-processed text...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SUN LI, ZHONG QIWEI, LIANG ANYANG, TANG JIAYU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The embodiment of the invention discloses a similar text recognizing processing method, device and system. The method includes the steps of generating feature data of to-be-processed texts, segmentingthe feature data to generate a group of segmented feature data, integrating the to-be-processed texts corresponding to the same segmented feature data in the group of segmented feature data to generate a locally-similar text identification group, calculating the distance between every two elements in the locally-similar text identification group to obtain similar text pairs with the distances meeting a set similarity threshold, and clustering the similar text pairs to obtain a recognized similar text group. By means of the embodiment, the similar texts can be rapidly and effectively recognized, the system processing bottleneck is solved, and the system processing speed is increased. 本申请实施例公开了种识别相似文本的处理方法、装置及系统。所述方法包括:生成待处理文本的特征数据;对所述特征数据进行分段,生成分段特征数据的集合;聚合所述分段特征数据的集合中相同分段特征数据对应的待处理文本,生成局部相似的文本标识集合;计算所述局部相似文本的集合中每