Encoding entity representations for cross-document coreference

A computer-implemented method for performing cross-document coreference for a corpus of input documents includes determining mentions by parsing the input documents. Each mention includes a first vector for spelling data and a second vector for context data. A hierarchical tree data structure is cre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Glass, Michael Robert, Farrell, Robert G, Rossiello, Gaetano, Gliozzo, Alfio Massimiliano, Monath, Nicholas Brady Garvan
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer-implemented method for performing cross-document coreference for a corpus of input documents includes determining mentions by parsing the input documents. Each mention includes a first vector for spelling data and a second vector for context data. A hierarchical tree data structure is created by generating several leaf nodes corresponding to respective mentions. Further, for each node, a similarity score is computed based on the first and second vectors of each node. The hierarchical tree is populated iteratively until a root node is created. Each iteration includes merging two nodes that have the highest similarity scores and creating an entity node instead at a hierarchical level that is above the two nodes being merged. Further, each iteration includes computing the similarity score for the entity node. The nodes with the similarity scores above a predetermined value are entities for which coreference has been performed in input documents.