COMPARING TEXT BASED DOCUMENTS
Text based documents are compared by lexically normalising each word of the text of a first document (104) to form a first normalised representation. A vector representation of the first document is built (206) from the first normalised representation. Each word of the text of a second document (110...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Text based documents are compared by lexically normalising each word of the text of a first document (104) to form a first normalised representation. A vector representation of the first document is built (206) from the first normalised representation. Each word of the text of a second document (110) is lexically normalised to form a second normalised representation. A vector representation of the second document is built (204) from the second normalised representation. The alignment of the vector representations is compared (210) to produce a score (218) of the similarity of the second document to the first document. |
---|