Rack Awareness Algorithm

Block - Block is a small chunk of data. It contains minimum amount of data that can be read or write. HDFS stores each file in terms of blocks.

Block size in Hadoop 1x is 64 MB.
Block size in Hadoop 2x is 128 MB.

Files are split into 64 MB or 128 MB blocks depending on Hadoop version and then stored into the Hadoop file system.

Why HDFS block size are large in size
Reason for having HDFS blocks in large size is to reduce the cost of seek time. In general, the seek time is 10 ms and disk transfer rate is 100 MB/S. To make the seek time 1% of the disk transfer rate, the block size should be 100 MB. The default size HDFS block is 64 MB.


Rack - Rack is a collection of machine which are connected using same network switch. If the network goes down, all the machine in a network will go down.

Rack Awareness algorithm came into the picture to overcome this problem.
In Rack Awareness, NameNode chooses the DataNode which is closer to the same rack or nearby rack.
NameNode maintains Rack ids of each DataNode to achieve rack information. Thus, this concept chooses DataNode based on the rack information. NameNode in Hadoop makes ensures that all the replicas should not stored on the same rack or single rack. Rack Awareness Algorithm reduces latency as well as Fault Tolerance.

Default replication factor is 3. Therefore according to Rack Awareness Algorithm:

  • The first replica of the block will store on a local rack.
  • The next replica will store on another DataNode within the same rack.
  • The third replica stored on the different rack.

Note - Hadoop creates three replica by default. If one system is failure, Hadoop will create one more copy automatically.

Comments

Popular posts from this blog

Hadoop

Big Data