Distributed multi-dimensional data screening method and device

The invention relates to a distributed multi-dimensional data screening method and equipment, and belongs to the technical field of big data screening. After data sets of different data sources are obtained, the data sets are cleaned and mapped to obtain data in a preset format, and the data in the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: XIA YUNCHAO, TANG DONGHUA, LUO GUANGHAN, JIRI GALA, HENG CHENGFEI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a distributed multi-dimensional data screening method and equipment, and belongs to the technical field of big data screening. After data sets of different data sources are obtained, the data sets are cleaned and mapped to obtain data in a preset format, and the data in the preset format are obtained through calculation of field confidence and screening of fields with high confidence; therefore, a final screening result is determined. Due to the fact that cleaning, mapping and confidence calculation are carried out on the data of the multiple data sources, the data with the high confidence is finally screened out, and the credibility and reliability of data screening are improved. 本发明涉及一种分布式多维度数据筛选方法及设备,属于大数据筛选技术领域,该方法及设备在获取到不同数据源的数据集后,通过对数据集进行清洗和映射,得到预设格式数据,并沟通过计算字段置信度,筛选置信度较高的字段,从而确定到最终筛选结果。由于对多数据源的数据进行了清洗、映射和置信度计算,最终筛选出置信度高的数据,提升了筛选数据的可信度和可靠度。