Data acquisition method based on media fusion

The invention provides a data acquisition method based on media fusion, which mainly performs structured extraction and data fusion on multi-source heterogeneous media data, such as data of different types of APPs, PC clients, HTML pages and the like, acquires massive heterogeneous data by integrati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LI MO, DING YUXIANG, HE CHENGLONG, BU HUAQI, GU XUEHAI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a data acquisition method based on media fusion, which mainly performs structured extraction and data fusion on multi-source heterogeneous media data, such as data of different types of APPs, PC clients, HTML pages and the like, acquires massive heterogeneous data by integrating the existing anti-crawling technology, performs classified extraction on different types of original materials, and performs data fusion on the data. The method comprises the following steps of: 1, fusing heterogeneous data sources; 2, filtering junk data; and 3, extracting text elements. 本发明提供了一种基于媒体融合的数据采集方法,本发明主要将多源异构媒体数据,例如:不同种类APP、PC客户端、HTML页面等数据进行结构化抽取并进行数据融合,通过整合已有的反爬技术获取海量异构数据,针对不同种类的原始素材进行分类抽取,从而完成对海量的媒体数据进行初步收集,为后续数据分析做数据储备,方法包括:1,异构数据源的融合;2,垃圾数据过滤;3,文本要素抽取。