Syllabus:
Unit I:Introduction To Big Data:
Introduction to Big Data , Characteristics of Big Data, Traits of Big data, Challenges of Conventional Systems(TB-1,Ch-1) , Sources of Big Data (TB-1,Ch-3).
Evolution Of Analytic Scalability. (TB-1,Ch-4), Analytic Processes and Tools (TB-1,Ch-5&6), Analysis vs Reporting, Modern Data Analytic Tools ,
Statistical Concepts: Sampling Distributions - Re-Sampling -Statistical Inference -Prediction Error.
(TB-1,Ch-7)
Unit II: Big Data In Enterprise:
Problems with traditional large-scale systems, Big Data in enterprise, Comparison with other systems(TB-1,Ch-10), Hadoop Frame work
(TB-2,Ch-1)
Unit III: Introduction To Hadoop:
History of Hadoop, Data Storage and Analysis (TB-2,Ch-1),
Hadoop -Setup hadoop -Pseudo mode-Cluster mode-IPv6-Installation of java, hadoop-Configurations of hadoop
(TB-2,Ch-10)
Unit IV: HDFS:
The Hadoop Distributed File System-HDFS Design and Architecture-HDFS Concepts-Interacting HDFS using command line-Interacting HDFS using Java APIs-Dataflow-Blocks-Replica-Hadoop Processes-Name node-Secondary name node-Job tracker-Task tracker-Data node.(TB-2,Ch-3)
Unit V:Map Reduce:
How Map Reduce Works-Anatomy of a Hadoop Cluster-Hadoop Ecosystem Components-Developing Map Reduce Application-Phases in Map Reduce Framework-Map Reduce Input and Output Formats- Introduction to Writing a Map Reduce Program-The Map Reduce Flow-Examining a Sample Map Reduce Program-Basic Map Reduce API Concepts-The Driver Code-The Mapper-The Reducer (TB-2,Ch-6, 7 & 8)
Introduction to Languages and Databases
Hadoop Programming languages: Pig, Hive (TB-2,Ch-16&17)
NOSQL Databases: Cassandra, Mongo, Cloudera, CouchDB, Hbase