Hadoop provides a cost effective storage solution for business.
It facilitates businesses to easily access new data sources and tap into different types of data to produce value from that data.
It is a highly scalable storage platform.
Unique storage method of Hadoop is based on a distributed file system that basically ‘maps’ data wherever it is located on a cluster. The tools for data processing are often on the same servers where the data is located, resulting in much faster data processing.
Hadoop is now widely used across industries, including finance, media and entertainment, government, healthcare, information services, retail, and other industries
Hadoop is fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there is another copy available for use.
Hadoop is more than just a faster, cheaper database and analytics tool. It is designed as a scale-out architecture that can affordably store all of a company’s data for later use
What is Big Data?
Why all industries are talking about Big Data?
What are the issues in Big Data?
What are the challenges for storing big data?
What are the challenges for processing big data?
What are the technologies support big data?
What is Hadoop?
History of Hadoop
Why Hadoop?
Hadoop Use cases
Advantages and Disadvantages of Hadoop
Importance of Different Ecosystems of Hadoop
Importance of Integration with other BigData solutions
Big Data Real time Use Cases
2
Daemons in Hadoop
Name Node
Secondary Name Node
Data Node
Data Storage in HDFS
HDFS Block size
HDFS Replication factor
Accessing HDFS
HDFS Commands
Configurations
How to overcome the Drawbacks in HDFS
How to add the new nodes ( Commissioning )
How to remove the existing nodes ( DE-Commissioning )
How to verify the Dead Nodes
How to start the Dead Nodes
Introduction to Namenode fedoration
Introduction to Namenode High Availability
Difference between Hadoop x.x and Hadoop 2.x.x versions
3
Importance of JobTracker
What are the roles of JobTracker
What are the drawbacks in JobTracker
Importance of TaskTracker
What are the roles of TaskTracker
What are the drawbacks in TaskTracker
What are the Data types in Map Reduce
Why these are importance in Map Reduce
Can we write custom Data Types in MapReduce
Text Input Format
Key Value Text Input Format
Sequence File Input Format
Nline Input Format
Importance of Input Format in Map Reduce
How to use Input Format in Map Reduce
How to write custom Input Formats and its Record Readers
Text Output Format
Sequence File Output Format
Importance of Output Format in Map Reduce
How to use Output Format in Map Reduce
How to write custom Output Format’s and its Record Writers
What is mapper in Map Reduce Job
Why we need mapper?
What are the Advantages and Disadvantages of mapper
Writing mapper programs
What is reducer in Map Reduce Job
Why we need reducer?
What are the Advantages and Disadvantages of reducer
Writing reducer programs
What is Driver in Map Reduce Job
Why we need Driver?
Writing Driver program
InputSplit
Need Of Input Split in Map Reduce
lnputSplit Size
InputSplit Size Vs Block Size
InputSplit Vs Mappers
Map Reduce Job execution flow
What is combiner in Map Reduce Job
Why we need combiner?
What are the Advantages and Disadvantages of Combiner
Writing Combiner programs
Identity Mapper and Identity Reducer
What is Partitioner in Map Reduce Job
Why we need Partitioner?
What are the Advantages and Disadvantages of Partitioner
Writing Partitioner programs
What is Distributed Cache in Map Reduce Job
Importance of Distributed Cache in Map Reduce job
What are the Advantages and Disadvantages of Distributed Cache
Writing Distributed Cache programs
What is Counter in Map Reduce Job
Why we need Counters in production environment?
How to Write Counters in Map Reduce programs
How to write custom Map Reduce Keys using Writable
How to write custom Map Reduce Values using Writable Comparable
Map Side Join
What is the importance of Map Side Join
Where we are using it
What is the importance of Reduce Side Join
Where we are using it
What is the difference between Map Side join and Reduce Side Join?
Importance of Compression techniques in production environment
Compression Types
NONE, RECORD and BLOCK
Compression Codecs
Default, Gzip, Bzip, Snappy and LZO
Enabling and Disabling these techniques for all the Jobs
Enabling and Disabling these techniques for a particular Job
How to write the Map Reduce jobs in Java
Running the Map Reduce jobs in local mode
Running the Map Reduce jobs in pseudo mode
Running the Map Reduce jobs in cluster mode
How to debug Map Reduce Jobs in Local
How to debug Map Reduce Jobs in Remote
4
What is YARN?
What is the importance of YARN?
Where we can use the concept of YARN in Real Time
What is difference between YARN and Map Reduce
What is Data Locality?
Will Hadoop follows Data Locality?
What is Speculative Execution?
Will Hadoop follows Speculative Execution?
Importance of each command
How to execute the command
Mapreduce admin related commands explanation
Can we change the existing configurations of mapreduce or not?
Importance of configurations
Writing Unit Tests for Map Reduce Jobs
Use of Secondary Sorting and how to solve using MapReduce
How to Identify Performance Bottlenecks in MR jobs and tuning MR
Map Reduce Streaming and Pipes with examples
Exploring the Apache MapReduce Web UI
5
Introduction to Apache Pig
Map Reduce Vs Apache Pig
SQL Vs Apache Pig
Different data types in Pig
Local Mode
Map Reduce Mode
Grunt Shell
Script
Embedded
How to write the UDF’s in Pig
How to use the UDF’s in Pig
Importance of UDF’s in Pig
How to write the Filter’s in Pig
How to use the Filter’s in Pig
Importance of Filter’s in Pig
How to write the Load Functions in Pig
How to use the Load Functions in Pig
Importance of Load Functions in Pig
How to use the Store Functions in Pig
Importance of Store Functions in Pig
Transformations in Pig
How to write the complex pig scripts
How to integrate the Pig and Hbase
6
Introduction Hive architecture
Driver
Compiler
Semantic Analyzer
Hive Integration with Hadoop
Hive Query Language(Hive QL)
SQL VS Hive QL
Hive Installation and Configuration
Hive, Map-Reduce and Local-Mode
Hive DLL and DML Operations
CLI
Hiveserver
Hwi
embedded metastore configuration
external metastore configuration
How to write the UDF’s in Hive
How to use the UDF’s in Hive
Importance of UDF’s in Hive
How to use the UDAF’s in Hive
Importance of UDAF’s in Hive
How to use the UDTF’s in Hive
Importance of UDTF’s in Hive
How to write a complex Hive queries
What is Hive Data Model?
Importance of Hive Partitions in production environment
Limitations of Hive Partitions
How to write Partitions
Importance of Hive Buckets in production environment
How to write Buckets
Importance of Hive SerDe’s in production environment
How to write SerDe programs
How to integrate the Hive and Hbase
7
Introduction to zookeeper
Pseudo mode installations
Zookeeper cluster installations
Basic commands execution
8
Hbase introduction
Hbase use cases
Hbase basics
Column families
Scans
Local mode
Pseudo mode
Cluster mode
Storage
Write Ahead Log
Log Structured Merge Trees
Mapreduce over Hbase
Key design
Bloom Filters
Versioning
Coprocessors
Filters
REST
Thrift
Hive
Web Based UI
Schema definition
Basic CRUD operations
9
Introduction to Sqoop
MySQL client and Server Installation
Sqoop Installation
How to connect to Relational Database using Sqoop
Sqoop Commands and Examples on Import
and Export commands
10
Introduction to flume
Flume installation
Flume agent usage and Flume examples execution
11
Introduction to oozie
Oozie installation
Executing oozie workflow jobs
Monitoring Oozie workflow jobs
12
Introduction to MongoDB
MongoDB installation
MongoDB examples