COMPARING TEXT BASED DOCUMENTS

Text based documents are compared by lexically normalising each word of the text of a first document (104) to form a first normalised representation. A vector representation of the first document is built (206) from the first normalised representation. Each word of the text of a second document (110...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WILLIAMS ROBERT FRANCIS, DREHER HEINZ
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text based documents are compared by lexically normalising each word of the text of a first document (104) to form a first normalised representation. A vector representation of the first document is built (206) from the first normalised representation. Each word of the text of a second document (110) is lexically normalised to form a second normalised representation. A vector representation of the second document is built (204) from the second normalised representation. The alignment of the vector representations is compared (210) to produce a score (218) of the similarity of the second document to the first document.