Stream processing method and system for unstructured file based on distributed architecture

The invention provides an unstructured file stream processing method based on a distributed architecture, and the method comprises the following steps: obtaining an unstructured file, and putting the unstructured file into an FTP (File Transfer Protocol) or MinIO (Minimum Input/Output); designing an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GUO RENHUANG, ZHENG HANJUN, ZHENG SIDA, LIAO NING, QIU FENGXING, LIU FUJIAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides an unstructured file stream processing method based on a distributed architecture, and the method comprises the following steps: obtaining an unstructured file, and putting the unstructured file into an FTP (File Transfer Protocol) or MinIO (Minimum Input/Output); designing an FTP (File Transfer Protocol) connector or a MinIO (Minimum Input Output) connector based on an Flink framework to read the unstructured file; performing dynamic processing on the unstructured file based on Flink distributed deployment, and recording and storing progress information of processing the unstructured file; a Format processor is integrated in the FTP connector or the MinIO connector, and the unstructured file is analyzed and processed; and writing the Flink SQL to write the processed data into a storage library. A large number of unstructured files which are continuously generated are read in a stream form by adopting an Flink distributed architecture, and the characteristics in stream processing are ap