Big Data
What is Big Data?
Big Data refers to large amount of data sets that can not be efficiently managed by the common database management system. Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction.
Data increasing day by day
1 ZB = 1000 EB
1 EB = 1000 PB
1 PB = 1000 TB
1 TB = 1000 GB
1 GB = 1000 MB
1 MB = 1000 KB
1 KB = 1000 BYTE
Note - Around 2.5 quintillion bytes of data is generated every day in all over the world, calculated by IBM.
Some Facts
- Every sec - 822 tweets on Twitter
- Every min - 510 Comments on Facebook
- Every min - 293000 statuses on Facebook
- Every min - 136000 photos uploads on Facebook
- Every hour - Walmart handles 1 million customer transaction
Challenges
- In collecting data.
- In storing data.
- In processing data.
- In sharing and Securing data.
- In creating and utilizing meaningful insights from their data.
Where we can loss data
- In Power Failure - If electricity is gone, System will go down and we lose our data.
- In Hardware/Software Failure - If Hardware/Software is failed, System will go down and we lose our data.
- In Network Failure - If network is down, Cluster will go down and we lose our data.
Four V's of Big Data
1. Volume -
It refers to large amount of data which are now frequently larger than terabytes and petabytes . Obviously, the Big Data needs a certain amount of data.
2. Velocity -
It refers to the speed at which the information is generated with such a pace it requires distinct (distributed) processing techniques.
3. Variety -
It refers to different types of data which comes from a different sources.
Data is divided into three types: Structured, Semi-structured and Unstructured.
4. Veracity -
It refers to quality of data that is being analyzed.
Uncertainty of data - It can have missing values, outliers.
Nice bro, please keep writing, i understand really well
ReplyDeleteSure thanks
DeleteAs you mentioned in variety (one of the V of big data) that we have three types of data so on which basis it is categorized that it is an unstructured data or a semi structured data
ReplyDeleteYes, it is categorized on the basis of:
Delete(a) if data is in the form of rows and columns, it is structured data.
(b) if data is in the form of Email, CSV, XML and JSON documents, it is semi-structured data.
(c) if data is in the form of audio or video file, it is un-structured data.