WEB CONTENT EXTRACTION SYSTEM AND METHOD AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LI, Yi-An, CHEN, Yuan-Chang, LIN, Ming-Lu, LU, Hsin-Tse, YANG, Chao-Chin
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a plurality of ordinary data. The metadata determining module is configured to divide the plurality of metadata into a plurality of target metadata and a plurality of non-target metadata. The plurality of target metadata is corresponding to a second web. The web correlation generating module is configured to generate a correlation level information between the first web and the second web. The storage path routing module is configured to route a web content of the second web to a first storage path or a second storage path and route the ordinary data to the first storage path.