Efficient attribute-oriented generalization for knowledge discovery from large databases

We present GDBR (Generalize DataBase Relation) and FIGR (Fast, Incremental Generalization and Regeneralization), two enhancements of Attribute Oriented Generalization, a well known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an online...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 1998-03, Vol.10 (2), p.193-208
Hauptverfasser: Carter, C.L., Hamilton, H.J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present GDBR (Generalize DataBase Relation) and FIGR (Fast, Incremental Generalization and Regeneralization), two enhancements of Attribute Oriented Generalization, a well known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an online algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable, although under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.
ISSN:1041-4347
1558-2191
DOI:10.1109/69.683752