One Week National Workshop on

Data Analytics using Hadoop (DAH-2019)

(Oct 14th - 18th, 2019)

Organized by: :

Department of CS&IT, Mahatma Gandhi Central University, Bihar

Important Date:

Extended Last Date for Registration: Oct. 13th, 2019 (12:00AM)

Workshop Dates: Oct 14th -18th, 2019

Registration Open: Sept. 20th, 2019

Registration Closed: Oct. 11th, 2019 (05:00PM) or registration may get closed before last date, if all seats get filled.


Syllabus:

1) Introduction to Big Data

Learning Objectives: In this module you will understand what is Big Data, why so much hype around it? What types of problems big data can solve? What are the current mega projects in Big Data space? How is this becoming a big business opportunity? What are the best practices of Big Data?

Topics: What is Big Data? , Why big data?, Evolution of big data, Big Data in Science and research ,Big Data in Government, Big Data in Private sector, Big Data Business Opportunity, Some use cases of Big Data, ,Big Data Critique, Big Data Best practices.

2)What is the difference between Business Intelligence, Analytics and Big Data?

In this module you will get a clear idea about the distinctions between BI, Analytics and Big Data and learn how these Data Science technologies will co-exist together rather than kill each other as speculated by media.

3) Framework of Big Data and Data Science Study

In this module you will learn about various disciplines of data sciences and how are they interlinked together. You will also learn about various learning and job-related opportunities and how can you transition yourself for next generation roles.

4) Big Data and Hadoop Architecture

In this module you will learn the high-level architecture behind Big Data and how Hadoop became the de-facto standard of Big Data. You will understand how the traditional database technologies were not enough to handle the big amount of data. Data in Private sector, Big Data Business Opportunity, Some use cases of Big Data, ,Big Data Critique, Big Data Best practices and how Hadoop could solve this problem.

Topics: Map Reduce Architecture, SMAQ stack for big data, Big Data

5) Data Loading Techniques

In this module you will learn the data loading techniques Flume and Sqoop.

Topics: FLUME, SQOOP

6) Pig and Pig Latin

You will learn why Pig is an important component of Hadoop framework and how PIG makes it easier to create MapReduce programs. You will learn Pig Latin commands and will be able to write Pig latin scripts.

Topics: Introduction to Pig, Pig Keywords, Pig Installation and execution, Pig basic commands, Exercise on Data Processing in Pig, Uploading data files, Creating scripts, running scripts.

7) Hive and HiveQL

In this module you will learn about Hive and HiveQL, how to install Apache, Loading and Querying Data in Hive and so on.

Topics: Hive Architecture, Hive Installation Steps, Hive Data Model, Hive Data Type, How to Process Data with Apache Hive (Exercise on Hive) data into Hive Tables from queries, Partitions and Buckets, Partitions, Buckets, Create/Drop/Alter Database, Querying and Inserting Data

8) NoSQL & HBase

You will learn the importance of NoSQL databases, about HBase in details, loading and querying the data in HBase. You will learn about Zookeeper and why HBase uses Zookeeper and also an example of Zookeeper application.

Topics: NoSQL databases, Introduction of HBase, HBase Architecture details, Introduction to Zookeeper, Data model and the hierarchical namespace, Nodes and ephemeral nodes.

9) OOZIE

In this module you will learn about Oozie, importance of workflow scheduler, definition, packaging and deployment of Oozie workflow.

Topics: Introduction of Oozie, Defining an Oozie workflow, Packaging and deploying an Oozie workflow application.

10) Hadoop 2.0, MRV2 (or YARN)

In this module, you will learn new features in Hadoop 2.0, namely YARN (also called MRv2), NameNode High Availability, HDFS Federation etc.

Topics: Fair Capacity, Capacity scheduler, NameNode High Availability, Introduction to Yarn, Programming in YARN framework.

11) Labs

There are lot of Lab exercises included in this course. These Lab exercises will enable you to do hands-on on Hadoop and help you gain practical knowledge on Hadoop from beginners to advanced level.

Lab 1: Hadoop

o Installation of Hadoop

o Installation of Required software like VMWare, etc

o Starting the Virtual Machine and Hadoop

o Test to check if Hadoop is running properly

Lab 2 : HDFS

o To demonstrate how data is stored in HDFS.

o Upload File to HDFS

o General HDFS commands like ls, du, cd, tail, mkdir , put etc

Lab 3 : Loading Data to HDFS

o Loading Data from Local system (using put etc)

o Loading Data from a Database to HDFS using Sqoop Export Data from HDFS to a Database using Sqoop

Lab 4 : Getting Started with Pig & Pig Latin

o Execute some basic Pig commands, load data into a relation, and store a relation into a folder in HDFS using different formats.

o Run the Pig scripts

o Define Schema, store, analyse and query the datasets.

Lab 5 : Hive

o Create Hive Table and store data from a file

o Create partition

o Query the data stored as a Table and find out some useful insights

11) Two Projects