Method, device and equipment for checking text similarity degree and medium

The invention discloses a method, a device, equipment and a medium for checking the similarity degree of texts. The method for checking the text similarity degree comprises the steps that vectorization processing is conducted on texts in a to-be-checked sample set, and a text vector feature set is o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: JI DAQI, ZAN YUNFEI, GAO XIANG, XU HONG, CHEN YUNWEN, SUN WU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a method, a device, equipment and a medium for checking the similarity degree of texts. The method for checking the text similarity degree comprises the steps that vectorization processing is conducted on texts in a to-be-checked sample set, and a text vector feature set is obtained; according to a first similarity algorithm, a similarity threshold and the text vector feature set, performing similar sample elimination on each to-be-inspected sample subset in the to-be-inspected sample set to obtain each primarily screened to-be-inspected sample subset; and performing word segmentation processing on the to-be-inspected sample set to obtain a text word segmentation sample set, and performing similar sample elimination on each preliminary screening to-be-inspected sample subset according to the text word segmentation sample set, a second similarity algorithm and a similarity threshold to obtain each target cleaning sample subset. According to the technical scheme, the cleaning effect of t