Q/A (Big Data) 2
Q - Hive Stores metadata in which type of database?
A - MySQL
Q - Why to analyze data?
A - (i) Exponential growth in a machine data over the last years.
(ii) Growing no of machine and usage of IOT devices.
Q - From where, large data is generated?
- Sensor data
- Machine data
- Business data
Q - What is communication model in hadoop?
- Hadoop follows RPC communication.
- Communication happens in every 3 sec (by default).
- Slaves communicate to master.
Q - What are important configuration files in hadoop?
- Hdfs-site.xml – Block replication
- Core-site.xml – I/O settings , setting of Hadoop cluster
- Hadoop-env.sh – Environment setup
- Master
- Slave
- Mapred-site.sh – For setting a map reduce
Q - Big Data as an opportunity?
- Cost Reduction - Cost effective storage system for huge data sets.
- Next generation products - Automated car, Health care.
- Faster and better decision making - Provides ways to analyze information quickly and make decisions.
- Improved services or products - Evaluation of customer needs and satisfaction.
Q - What is different types of file format in Hadoop or Hive?
- Text/CSV
- SequenceFile
- Avro
- Parquet
- RCFile (Record Columnar File)
- ORC (Optimized Row Columnar)
Q - What is difference between Avro and parkout?
Avro - Avro is a row-based storage format. This format includes in each file, the definition of the scheme of your data in JSON format, improving interoperability and allowing the evolution of the scheme. Avro also allows block compression in addition to its divisibility, making it a good choice for most cases when using Hadoop.
Parquet - Parquet is a column-based binary storage format that can store nested data structures. This format is very efficient in terms of disk input/output operations when the necessary columns to be used are specified. This format is very optimized for use with Cloudera Impala.
Q - What is checkpoint in hadoop?
- If any changes happen in name node,It records in edit logs and in a regular interval, it merges with fs_image (this process is called checkpoint).
This is really informative blog, I have to thank for your efforts. Waiting for more post like this.
ReplyDeleteBig Data Analytics Courses in Chennai
Data Analytics Training in Bangalore
Data Analyst course in Coimbatore
Hadoop Administration Training in Chennai
Salesforce Programming Skills
Sure Tanya, any help in big data, reach out to me
Delete