Managing provenance information for data processing pipeline

A method is disclosed for managing provenance information associated with one or more interconnected provenance entities in a provenance system over a network interface for data processing pipelines in a distributed cloud environment, where each data processing pipeline is configured to read in data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SHUKLA AMIT, NAYAK SUSMITA, SVENSSON FINN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method is disclosed for managing provenance information associated with one or more interconnected provenance entities in a provenance system over a network interface for data processing pipelines in a distributed cloud environment, where each data processing pipeline is configured to read in data, transform the data, and output the transformed data. The method comprises the following steps performed by a configuration component: obtaining at least one declarative intent representing a configuration indicative of a requirement and a priority level for storing provenance information for each data processing pipeline; deriving, based on the obtained at least one declarative intent, requirements and priority levels for storing provenance information for each data processing pipeline, where one of the priority levels-a first priority level-is higher than the other priority level-a second priority level; estimating a storage capacity for storing provenance information in the provenance system based on the derive