HDFS
What is HDFS?
Before discussing HDFS, let us first discuss
Scalability
The primary benefit of Hadoop is its Scalability. One can easily scale the cluster by adding more nodes.
There are two types of Scalability in Hadoop: Vertical and Horizontal
Vertical Scalability - It is also referred as “scale up”. In vertical scaling, you can increase the hardware capacity of the individual machine. In other words, you can add more RAM or CPU to your existing system to make it more robust and powerful.
Horizontal Scalability - It is also referred as “scale out” is basically the addition of more machines or setting up the cluster. In horizontal scaling instead of increasing hardware capacity of individual machine you add more nodes to existing cluster and most importantly, you can add more machines without stopping the system.
HDFS
- HDFS stand for Hadoop Distributed File Storage. HDFS provides better data throughput than traditional file systems.
- It provides a way to manage large amounts of structured and unstructured data.
- HDFS is built in such a way, It can detect failure and automatically recover on its own.
- It can be used to scale a Hadoop cluster to hundreds/thousands of nodes.
- HDFS is designed to support very large files. It splits these large files into blocks.
- You can replicate HDFS data from one HDFS service to another. Data blocks are replicated to provide fault tolerance, and an application can specify the number of replicas of a file.
- HDFS is designed on principle of write-once and read-many-times. Once data is written large portions of data set can be processed any number times.
- There are two components of HDFS - NameNode and DataNode. While there is only one name node, there can be multiple data nodes.
Kudos Champ!
ReplyDeleteSimple explanatory blog 👌
Thanks buddy, i'll keep posted
Delete