Scheduling High Performance Data Mining Tasks on a Data Grid Environment

Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge discovery process is both data and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Orlando, S., Palmerini, P., Perego, R., Silvestri, F.
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Completion Time Computer science control theory systems Data Mining Algorithm Exact sciences and technology Execution Cost Execution Time Information systems. Data bases Memory organisation. Data processing Schedule Decision Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge discovery process is both data and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on the core services of such a Grid infrastructure. In particular we concentrate our attention on the design and implementation of specialized broker aware of data source locations and resource needs of data mining tasks. Allocation and scheduling decisions are taken on the basis of performance cost metrics and models that exploit knowledge about previous executions, and use sampling to acquire estimate about execution behavior.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/3-540-45706-2_49