Comparing text pages using image features based on word positions

A signature for a page of text is generated. The signature serves as an identifier of the text page. Positions of words in a text page are determined. Positions of multiple second words in the text page are determined relative to the position of a first word in the text page. A signature value is ge...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Spasojevic, Nemanja L, Poncin, Guillaume, Bloomberg, Dan S
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A signature for a page of text is generated. The signature serves as an identifier of the text page. Positions of words in a text page are determined. Positions of multiple second words in the text page are determined relative to the position of a first word in the text page. A signature value is generated that describes the second word positions relative to the first word position. The signature value is stored. Additional signatures for the text page can be generated, each signature describing positions of other words in the text page relative to a word in the text page for which the signature is being generated. The signatures can be used to compare the text page to another text page and generate a measure of similarity that describes the result of the comparison.