Data acquisition method based on media fusion
The invention provides a data acquisition method based on media fusion, which mainly performs structured extraction and data fusion on multi-source heterogeneous media data, such as data of different types of APPs, PC clients, HTML pages and the like, acquires massive heterogeneous data by integrati...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a data acquisition method based on media fusion, which mainly performs structured extraction and data fusion on multi-source heterogeneous media data, such as data of different types of APPs, PC clients, HTML pages and the like, acquires massive heterogeneous data by integrating the existing anti-crawling technology, performs classified extraction on different types of original materials, and performs data fusion on the data. The method comprises the following steps of: 1, fusing heterogeneous data sources; 2, filtering junk data; and 3, extracting text elements.
本发明提供了一种基于媒体融合的数据采集方法,本发明主要将多源异构媒体数据,例如:不同种类APP、PC客户端、HTML页面等数据进行结构化抽取并进行数据融合,通过整合已有的反爬技术获取海量异构数据,针对不同种类的原始素材进行分类抽取,从而完成对海量的媒体数据进行初步收集,为后续数据分析做数据储备,方法包括:1,异构数据源的融合;2,垃圾数据过滤;3,文本要素抽取。 |
---|