HDFS Failover Mechanism

Each DataNode sends a Heartbeat message to the NameNode periodically. If NameNode does not receive any Heartbeat from a perticular DataNode, it marks the DataNode as dead and does not forward any new IO requests to them. The NameNode actively monitors  the number of replicas of data blocks and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

HDFS is designed to make data accessible even though failures occur frequently. The common failure scenarios.

Data Disk Failure: Hard disks may crash and the data on that disk will be unavailable. In this case all the IO requests are redirected to replicas on other disks on the same cluster.

NameNode failure : NameNode failure can halt the HDFS, if no back up is maintained. A backup of NameNode components must be configured, so that in the case of  NameNode failure, the backup NameNode can take over the control.

DataNode failure: This condition is detected by NameNode if Heartbeat message is not received from a DataNode. All the data registered to the DataNode becomes unavailable. The NameNode redirects the IO requests to DataNodes containing the replicas of data blocks.

Network disruption: A network partition can cause a subset of DataNodes to lose connectivity with the NameNode.

Corrective Actions Taken by HDFS

Cluster Rebalancing : HDFS provides a balancer utility that analyzes block placement and balances data across the DataNodes. It moves blocks until the cluster is deemed to be balanced, which means that the utilization of every DataNode (ratio of used space on the node to total capacity of the node) differs from the utilization of the cluster (ratio of used space on the cluster to total capacity of the cluster) by no more than a given threshold percentage. The balancer does not balance between individual volumes on a single DataNode.

Data Integrity : A block of data fetched from DataNode may arrive corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.

Metadata Disk Failure : The FsImage and the EditLog are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a NameNode can support. However, this degradation is acceptable because even though HDFS applications are very data intensive in nature, they are not metadata intensive. When a NameNode restarts, it selects the latest consistent FsImage and EditLog to use.

Snapshots

Snapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time.