A collection method based on an educational network information theme

The invention discloses a collection method based on an educational network information theme, which can ensure that a large amount of URL addresses and webpage text information are highly correlatedwith the theme and improve the accuracy of collecting the educational network information theme at th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN CHICHANG, YANG FAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a collection method based on an educational network information theme, which can ensure that a large amount of URL addresses and webpage text information are highly correlatedwith the theme and improve the accuracy of collecting the educational network information theme at the same time. The collection method based on the education network information theme comprises the following steps of collecting a network page, analyzing and downloading the pages, extracting the page information, removing unrelated pages and unrelated URLs, then performing duplicate removal on thepages and the URLs, storing the duplicate-removed pages in an education information base, extracting URLs of the duplicate-removed pages, putting the URLs into an acquired URL sequence, then supplying the URLs to a collector, and performing re-acquisition on the webpages. By adopting the collection method based on the education network information theme, the collection efficiency can be improved,and the collection effecti