WEB CONTENT EXTRACTION SYSTEM AND METHOD AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A web content extraction system includes a web structure analyzing module, a metadata determining module, a web correlation generating module and a storage path routing module. The web structure analyzing module is configured to divide a web content of a first web into a plurality of metadata and a plurality of ordinary data. The metadata determining module is configured to divide the plurality of metadata into a plurality of target metadata and a plurality of non-target metadata. The plurality of target metadata is corresponding to a second web. The web correlation generating module is configured to generate a correlation level information between the first web and the second web. The storage path routing module is configured to route a web content of the second web to a first storage path or a second storage path and route the ordinary data to the first storage path. |
---|