HDFS

What is HDFS?

Before discussing HDFS, let us first discuss

Scalability

The primary benefit of Hadoop is its Scalability. One can easily scale the cluster by adding more nodes.

There are two types of Scalability in Hadoop: Vertical and Horizontal

Vertical Scalability - It is also referred as “scale up”. In vertical scaling, you can increase the hardware capacity of the individual machine. In other words, you can add more RAM or CPU to your existing system to make it more robust and powerful.

Horizontal Scalability - It is also referred as “scale out” is basically the addition of more machines or setting up the cluster. In horizontal scaling instead of increasing hardware capacity of individual machine you add more nodes to existing cluster and most importantly, you can add more machines without stopping the system.


HDFS

  1. HDFS stand for Hadoop Distributed File Storage. HDFS provides better data throughput than traditional file systems.
  2. It provides a way to manage large amounts of structured and unstructured data.
  3. HDFS is built in such a way, It can detect failure and automatically recover on its own.
  4. It can be used to scale a Hadoop cluster to hundreds/thousands of nodes.
  5. HDFS is designed to support very large files. It splits these large files into blocks.
  6. You can replicate HDFS data from one HDFS service to another. Data blocks are replicated to provide fault tolerance, and an application can specify the number of replicas of a file.
  7. HDFS is designed on principle of write-once and read-many-times. Once data is written large portions of data set can be processed any number times.
  8. There are two components of HDFS - NameNode and DataNode. While there is only one name node, there can be multiple data nodes.

Comments

Post a Comment

If you have any doubts, Please let me know

Popular posts from this blog

Hadoop

Rack Awareness Algorithm

Big Data