Syllabus:

Unit I:Introduction To Big Data:

Introduction to Big Data , Characteristics of Big Data, Traits of Big data, Challenges of Conventional Systems(TB-1,Ch-1) , Sources of Big Data (TB-1,Ch-3).

Evolution Of Analytic Scalability. (TB-1,Ch-4), Analytic Processes and Tools (TB-1,Ch-5&6), Analysis vs Reporting, Modern Data Analytic Tools ,

Statistical Concepts: Sampling Distributions - Re-Sampling -Statistical Inference -Prediction Error.

(TB-1,Ch-7)

Unit II: Big Data In Enterprise:

Problems with traditional large-scale systems, Big Data in enterprise, Comparison with other systems(TB-1,Ch-10), Hadoop Frame work

(TB-2,Ch-1)

Unit III: Introduction To Hadoop:

History of Hadoop, Data Storage and Analysis (TB-2,Ch-1),

Hadoop -Setup hadoop -Pseudo mode-Cluster mode-IPv6-Installation of java, hadoop-Configurations of hadoop

(TB-2,Ch-10)

Unit IV: HDFS:

The Hadoop Distributed File System-HDFS Design and Architecture-HDFS Concepts-Interacting HDFS using command line-Interacting HDFS using Java APIs-Dataflow-Blocks-Replica-Hadoop Processes-Name node-Secondary name node-Job tracker-Task tracker-Data node.(TB-2,Ch-3)

Unit V:Map Reduce:

How Map Reduce Works-Anatomy of a Hadoop Cluster-Hadoop Ecosystem Components-Developing Map Reduce Application-Phases in Map Reduce Framework-Map Reduce Input and Output Formats- Introduction to Writing a Map Reduce Program-The Map Reduce Flow-Examining a Sample Map Reduce Program-Basic Map Reduce API Concepts-The Driver Code-The Mapper-The Reducer (TB-2,Ch-6, 7 & 8)

Introduction to Languages and Databases

Hadoop Programming languages: Pig, Hive (TB-2,Ch-16&17)

NOSQL Databases: Cassandra, Mongo, Cloudera, CouchDB, Hbase