The HDFS Replica Placement Policies: A Comparative Experimental Investigation

The Hadoop Distributed File System (HDFS) is a robust and flexible file system designed for reliably storing large volumes of data in distributed environments. Its storage model relies upon data replication and one of its central features is to optimize the placement of the replicas across the clust...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Fazul, Rhauani Weber Aita, Barcelos, Patrícia Pitthan
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Block distribution Computer Science Data replication Distributed file systems Distributed, Parallel, and Cluster Computing Replica placement policies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Hadoop Distributed File System (HDFS) is a robust and flexible file system designed for reliably storing large volumes of data in distributed environments. Its storage model relies upon data replication and one of its central features is to optimize the placement of the replicas across the cluster for fault tolerance, availability, and performance. To this end, the Replica Placement Policy selects which nodes will store the data blocks. This work presents an experimental investigation of the different placement strategies available in HDFS. For a broader analysis, we consider different stages where the placement of the replicas is necessary, such as writing files in the system, re-replicating blocks among the nodes, and balancing the replica distribution in the cluster. The evaluation results allowed a deeper understanding of the behavior of the policies, in addition to highlighting the advantages and drawbacks of the replica placement concerning optimizations in data availability, data locality, write and read throughput, and in the overall performance of the HDFS.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-16092-9_10