METHOD AND SYSTEM FOR DETECTING DUPLICATE DOCUMENT USING VECTOR QUANTIZATION
Disclosed is a method and system for detecting a duplicate document using vector quantization. A duplicate document detection method may include acquiring, by processing circuitry, a respective vector expression for each of a plurality of documents using a similarity model, the similarity model bein...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Disclosed is a method and system for detecting a duplicate document using vector quantization. A duplicate document detection method may include acquiring, by processing circuitry, a respective vector expression for each of a plurality of documents using a similarity model, the similarity model being trained to output similar vector expressions for semantically similar documents, generating a key by performing a vector quantization on the respective vector expression, the key including a binary character string, and detecting a duplicate document from among the plurality of documents using the key. |
---|