Text Schema Mining Using Graphs and Formal Concept Analysis

This paper presents an investigation into finding and evaluating schemata through formal concept analysis. Schemata are used in conceptual authoring support to provide proven building blocks of text structures. As still only few schemata are available, ways to mine them from structures of existing t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gatzemeier, Felix H., Meyer, Oliver
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents an investigation into finding and evaluating schemata through formal concept analysis. Schemata are used in conceptual authoring support to provide proven building blocks of text structures. As still only few schemata are available, ways to mine them from structures of existing texts seem worthwhile. The general process begins with the structure of a text as a graph, transforms this into a formal context and examines the formal concept lattice for this context. Especially formal concepts with large extents may be candidates for schemata. Three alternative kinds of transformations are presented: Wille’s Natural transformation produces contexts mainly based on type and connection information,Schema-derived transformations derive of attributes that identify partial or complete instances from a set of schemata,Informal: Starting from a set of schemata, manually formulate conditions that may be present in the instance graph and contribute to the presence of such schemata. We have regarded document structures consisting of a hierarchy of sections and subsections, which may import and export topics. The topics are interconnected in a conceptual graph called the topic map. Results of processing two such structures with the natural transformation and an informal one are reported. Some notes on the implementation in the Chasid prototype are given.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-45483-7_9