Data Management in HDFS

Data Replication :

  • HDFS can store single large file across different machines. If the file is large, it is divided into a set of blocks.
  • All the blocks would be of same size except the last block. These blocks are replicated across different machines for fault tolerance.
  • We can configure the block size(default 64 MB) and replication factor(default 3). We can specify the replication factor when creating the fileand can change it later.
  • HDFS files are written once and read many times.
  • Namenode keeps track of replicas of any block. NameNode periodically receives Heartbeat and block-report messages from the DataNodes in the cluster.
  • Heartbeat messages received from from DataNodes tell NameNode about health of the DataNode and the block-report informs about all blocks on a DataNode and related file names.

How the replicas are stored in DataNodes:

  • The placement of replicas determines the reliability and performance of HDFS.
  • HDFS uses a rack-aware replica placement policy to improve data reliability, availability, and network bandwidth utilization.
  • Large HDFS instances run on machines spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.
  • Referring to the process of Hadoop Rack Awareness, NameNode finds the rack id of each DataNode. A simple policy is to place replicas on unique racks. This allows higher bandwidth because of read from multiple racks. Also prevents loss of data when an entire rack fails. This policy has a disadvantage of write delays during replication to different rack.
  • Optimal replica placement gives HDFS advantage over other distributed file systems. This requires some tuning.
  • With the default replication factor of three, HDFS’s placement policy puts one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance.

Replica Selection for read requests:

  • HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request.
  • If HDFS cluster spans multiple data centers, then a local replica is preferred over any remote replica.


Namenode running in SafeMode:

  • On startup, the NameNode enters a special state called SafeMode. Replication of data blocks does not occur when the NameNode is in the Safemode state.
  • The NameNode receives Heartbeat and Blockreport messages from the DataNodes. A Blockreport contains the list of data blocks that a DataNode is hosting. Each block has a specified minimum number of replicas.
  • A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode. After a configurable percentage of safely replicated data blocks checks in with the NameNode (plus an additional 30 seconds), the NameNode exits the Safemode state.
  • Namenode then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The NameNode then replicates these blocks to other DataNodes.