Rapid ETL method based on Spark SQL temporary view

The invention relates to a rapid ETL method, device, equipment and medium based on Spark SQL temporary views, and the method comprises the steps: constructing an SQL statement of each ETL step, constructing process nodes and a target process DAG graph of the whole ETL process according to the SQL st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DENG WEIYU, LI ZHUANGZHUANG, LI XIANFENG, TAO TIANLIN, WANG KAI, WANG DONGDONG, ZHANG XIONGBIAO, ZHANG YONGQIANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a rapid ETL method, device, equipment and medium based on Spark SQL temporary views, and the method comprises the steps: constructing an SQL statement of each ETL step, constructing process nodes and a target process DAG graph of the whole ETL process according to the SQL statements, and sequentially establishing temporary views of each process node through Spark based on a topological sequence of the target process DAG graph, and outputting the target data to the target database. Compared with the prior art, the method has the advantages that the Spark SQL and the DAG used in the ETL process are integrated, the DAG is finally expressed as a complete and full-process Spark SQL, the Spark SQL runs on a Spark platform to realize basic functions of an ETL tool, the development efficiency is improved, the ETL process starts to execute data processing actions only after the final temporary view is constructed, hardware resources are greatly saved, and the development cost is reduced. And t