Method, device and equipment for checking text similarity degree and medium
The invention discloses a method, a device, equipment and a medium for checking the similarity degree of texts. The method for checking the text similarity degree comprises the steps that vectorization processing is conducted on texts in a to-be-checked sample set, and a text vector feature set is o...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a method, a device, equipment and a medium for checking the similarity degree of texts. The method for checking the text similarity degree comprises the steps that vectorization processing is conducted on texts in a to-be-checked sample set, and a text vector feature set is obtained; according to a first similarity algorithm, a similarity threshold and the text vector feature set, performing similar sample elimination on each to-be-inspected sample subset in the to-be-inspected sample set to obtain each primarily screened to-be-inspected sample subset; and performing word segmentation processing on the to-be-inspected sample set to obtain a text word segmentation sample set, and performing similar sample elimination on each preliminary screening to-be-inspected sample subset according to the text word segmentation sample set, a second similarity algorithm and a similarity threshold to obtain each target cleaning sample subset. According to the technical scheme, the cleaning effect of t |
---|