Data Discovery

This chapter discusses the importance of data discovery for a modern Big Data platform, finds the link between data discovery and data governance, and explore tooling used in Big Data discovery. Disparate data sources in Big Data systems make accessing metadata troublesome. A single source of metada...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Aytas, Yusuf
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This chapter discusses the importance of data discovery for a modern Big Data platform, finds the link between data discovery and data governance, and explore tooling used in Big Data discovery. Disparate data sources in Big Data systems make accessing metadata troublesome. A single source of metadata makes metadata available for further processing. The data lineage is partially available in workflow orchestration. Data lineage consists of multiple directed acyclic graphs where each node in the dag corresponds to a table or a data structure. Responsibility and accountability are the driving factors for data ownership. A good presentation layer helps in completing the feedback loop for data discovery. There is a close relationship between data discovery and data governance. Big Data governance depends on the data architecture and data source integration, and factors for data governance. Apache Atlas is a data governance software designed to collect, organize, and store metadata.
DOI:10.1002/9781119690962.ch9