Hudi asynchronous data clustering method and system based on hotspot prediction
The invention relates to a Hudi asynchronous data clustering method and system based on hotspot prediction. Comprising the following steps: SQL query statement analysis: collecting SQL statements, carrying out word segmentation processing, establishing a vocabulary and establishing an embedded layer...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a Hudi asynchronous data clustering method and system based on hotspot prediction. Comprising the following steps: SQL query statement analysis: collecting SQL statements, carrying out word segmentation processing, establishing a vocabulary and establishing an embedded layer; on the basis of the acquired and analyzed SQL statement, predicting a query hotspot field and a query hotspot table by adopting a trained LSTM model based on an online learning algorithm; and obtaining a query hotspot field and a query hotspot table predicted by the trained LSTM model based on the online learning algorithm, and automatically carrying out asynchronous data clustering analysis for multiple times. According to the method, the problem of data inconsistency generated during asynchronous clustering can be avoided to a certain extent, so that the data files in the partitions have relatively high freshness; in addition, by optimizing the hot data layout, the query efficiency when the Hudi is used as the |
---|