Extracting entity profiles from semistructured information spaces
A semistructured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, struct...
Gespeichert in:
Veröffentlicht in: | SIGMOD record 1997-12, Vol.26 (4), p.32-38 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A semistructured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, structured methods like SQL cannot be easily employed, and users often must make do with only full-text search. In this paper, we describe an approach that provides structured querying for particular types of
entities
, such as companies and people. Entity-based retrieval is enabled by
normalizing
entity references in a heuristic, type-dependent manner. The approach can be used to retrieve documents and can also be used to construct entity profiles — summaries of commonly sought information about an entity based on the documents' content. The approach requires only a modest amount of meta-information about the source collections, much of which is derived automatically. |
---|---|
ISSN: | 0163-5808 |
DOI: | 10.1145/271074.271083 |