Efficient sampling of a relational database
A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are ar...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A system, method and computer readable medium for sampling data from a relational database are disclosed, where an information processing system chooses rows from a table in a relational database for sampling, wherein data values are arranged into rows, rows are arranged into pages, and pages are arranged into tables. Pages are chosen for sampling according to a probability P and rows in a selected page are chosen for sampling according to a probability R, so that the overall probability of choosing a row for sampling is Q=PR. The probabilities P and R are based on the desired precision of estimates computed from a sample, as well as processing speed. The probabilities P and R are further based on either catalog statistics of the relational database or a pilot sample of rows from the relational database. |
---|