WEB CONTENT EXTRACTION SYSTEM AND METHOD AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LI, Yi-An, CHEN, Yuan-Chang, LIN, Ming-Lu, LU, Hsin-Tse, YANG, Chao-Chin
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a plurality of ordinary data. The metadata determining module is configured to divide the plurality of metadata into a plurality of target metadata and a plurality of non-target metadata. The plurality of target metadata is corresponding to a second web. The web correlation generating module is configured to generate a correlation level information between the first web and the second web. The storage path routing module is configured to route a web content of the second web to a first storage path or a second storage path and route the ordinary data to the first storage path.