Hadoop
What is Hadoop?
- Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion. It was developed by Doug Cutting in 2005, was working in yahoo.
- It efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
- Hadoop is used for Batch processing where, first data is collected, entered and then processed. Later, it produces batch result.
- It uses master-slave architecture where one master and multiple slave works in a cluster..
- Hadoop uses commodity hardware. So, it is cost effective.
- It is scalable as per requirement.
- Hadoop deals with flat files in any format.
- Hadoop follows RPC communication model. By default, Communication happens in every 3 seconds where slave communicates to master.
There are three components of Hadoop:
- HDFS - Hadoop Distributed File Storage.
- MR - MapReduce refers to Distributed Parallel Processing.
- Yarn - Yet Another Resource Negotiator used for Resource Management.
Hadoop Functionality:
- Hadoop automatically splits the file into blocks.
- Hadoop automatically replicate the files.
- Hadoop automatically maintain the metadata.
Hadoop Cluster:
Hadoop works by creating a cluster. Cluster is a collection of nodes. There are two main nodes in HDFS - DataNode and NameNode
DataNode:
DataNode is also known as master node. DataNode stores actual data of HDFS.
NameNode:
NameNode is also known as slave node. NameNode stores the metadata of HDFS.
Different tools can be used for Big Data:
- Hive
- Cassandra
- Hbase
- Bigtable
- Sqoop
- Pig
- Zookeeper
- NoSQL
- Mahout
- Oozie
- Flume
- Spark
- Impala
- MongoDB
- Solar, Lucene
Please explain cassandra
ReplyDeleteSure buddy
Deleteplease tell how to install hadoop in personal system
ReplyDeletesure buddy, i can't publish that data on blog, reach out to me. Note: I am not going to charge anything. I just want to share my knowledge with everyone.
Delete