Extracting entity profiles from semistructured information spaces

A semistructured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, struct...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SIGMOD record 1997-12, Vol.26 (4), p.32-38
Hauptverfasser: Nado, Robert A., Huffman, Scott B.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A semistructured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, structured methods like SQL cannot be easily employed, and users often must make do with only full-text search. In this paper, we describe an approach that provides structured querying for particular types of entities , such as companies and people. Entity-based retrieval is enabled by normalizing entity references in a heuristic, type-dependent manner. The approach can be used to retrieve documents and can also be used to construct entity profiles — summaries of commonly sought information about an entity based on the documents' content. The approach requires only a modest amount of meta-information about the source collections, much of which is derived automatically.
ISSN:0163-5808
DOI:10.1145/271074.271083