Dynamic data-ingestion pipeline

In order to ingest data from an arbitrary source in a set of sources, a computer system accesses predefined configuration instructions. Then, the computer system generates a dynamic data-ingestion pipeline that is compatible with a Hadoop file system based on the predefined configuration instruction...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Qiao, Lin, Tu, Min, Surlaker, Kapil L, Liu, Ziyang, Veeramreddy, Narasimha R, Li, Yinan, Das, Shirshanka, Botev, Chavdar, Dai, Ying, Takiar, Sahil, Buenrostro, Issac, Goodhope, Kenneth D
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In order to ingest data from an arbitrary source in a set of sources, a computer system accesses predefined configuration instructions. Then, the computer system generates a dynamic data-ingestion pipeline that is compatible with a Hadoop file system based on the predefined configuration instructions. This dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data. Moreover, the computer system receives the data from the source. Next, the computer system processes the data using the dynamic data-ingestion pipeline as the data is received without storing the data in memory for the purpose of subsequent ingestion processing.