Hudi asynchronous data clustering method and system based on hotspot prediction

The invention relates to a Hudi asynchronous data clustering method and system based on hotspot prediction. Comprising the following steps: SQL query statement analysis: collecting SQL statements, carrying out word segmentation processing, establishing a vocabulary and establishing an embedded layer...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GUO YING, YANG XIAOHAN, WU XIAOMING, ZHANG YING, ZHANG QIUPING, PAN JINGSHAN, YANG MEIHONG, LIU SHANGXU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a Hudi asynchronous data clustering method and system based on hotspot prediction. Comprising the following steps: SQL query statement analysis: collecting SQL statements, carrying out word segmentation processing, establishing a vocabulary and establishing an embedded layer; on the basis of the acquired and analyzed SQL statement, predicting a query hotspot field and a query hotspot table by adopting a trained LSTM model based on an online learning algorithm; and obtaining a query hotspot field and a query hotspot table predicted by the trained LSTM model based on the online learning algorithm, and automatically carrying out asynchronous data clustering analysis for multiple times. According to the method, the problem of data inconsistency generated during asynchronous clustering can be avoided to a certain extent, so that the data files in the partitions have relatively high freshness; in addition, by optimizing the hot data layout, the query efficiency when the Hudi is used as the