System and method for determining content similarity by comparing semantic entity attributes
A method for identifying documents that are content similar to an input document includes receiving a request for identifying similar documents from a plurality of candidate documents, retrieving document classification attributes for the input document and the candidate documents, where the documen...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for identifying documents that are content similar to an input document includes receiving a request for identifying similar documents from a plurality of candidate documents, retrieving document classification attributes for the input document and the candidate documents, where the document classification attributes are document level attributes. The method further includes comparing a document classification attribute of the input document to a classification attribute of the candidate document to identify a subset of candidate documents having a matching document classification attribute; retrieving semantic entities from the input document and from the candidate documents in the subset; comparing the semantic entity attributes of the input document with the semantic entity attributes of the candidate documents in the subset in pairs to identify semantic entities with matched semantic attributes; calculating a content similarity score between the semantic entity of the input document and the seman |
---|