Distributed crawler system supporting breakpoint task restart
The invention discloses a distributed crawler system supporting breakpoint task restart, which comprises a master node responsible for task scheduling and a plurality of slave nodes responsible for task acquisition, the core of the master node is a task scheduler, and the task scheduler is responsib...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a distributed crawler system supporting breakpoint task restart, which comprises a master node responsible for task scheduling and a plurality of slave nodes responsible for task acquisition, the core of the master node is a task scheduler, and the task scheduler is responsible for scheduling the cyclic process of tasks such as analysis, recording and distribution of all acquisition pages; the system operation steps comprise master node starting, task pool state persistence, task dynamic weighted distribution and slave node task processing, and the method has the beneficial effects that the progress state of the task pool can be persistently stored, the whole crawler system can be supported to be restarted at any time or only part of slave nodes can be supported to be restarted, the crawler task can be restarted in a breakpoint manner, and the operation efficiency is improved. The situation that after the master node or a certain slave node breaks down, tasks under the node are lost, a |
---|