HadoopExam Learning Resources
Welcome to the HadoopExam Spark 2.x SQL Professional Training with HandsOn Sessions.

To Access this training , you must Have Subscription from www.HadoopExam.com
Please check Here for getting full training access 




This page is mainly for Spark 2.x SQL Professional Training.
  • Using SignIn, to login with your permitted email Id
  • Use the Pedagogy Navigation to watch Individual Problem and Solutions Video
Syllabus Covered as Part of This training (Become an Spark 2.x SQL Expert in around 8+ hours training

Module-1 : Introduction to Spark SQL ( PDF Download Available Length 37 Minutes)
  • What is new in Spark SQL
  • What are the main Goals of SparkSQL
  • Start of Dataset/DataFrame API
  • Data Pipelines
  • Introduction to UDF
Module-2 : Introduction Apache Spark SQL's Catalyst optimizer ( PDF Download & Available Length 38 Minutes)
  • What is Catalyst optimizer
  • Concepts of Tree and Rules
  • Various Phases of Catalyst optimizer
    • Analysis
    • Logical optimization
    • Physical planning
    • Code Generation
  • Scala Features concepts
    • Predicate Pushdown
    • Constant Folding
    • Physical operator
    • Project Prunning
Module-3 : Introduction to Project Tungsten  ( PDF Download & Available Length 31 Minutes)
  • Purpose of project Tungsten
  • Binary Processing
  • Cache Aware Computation
  • Code Generation
  • Custom Memory Management
Module-4 : Create Spark Environment using Ubuntu Linux on Windows

Module-4A : Step by Step Installation of VMWARE Workstation Player PDF Download & Available Length 8 Minutes & HandsOn )

Module-4B : Install Ubuntu Linux in VMWare Player PDF Download & Available Length 23 Minutes & HandsOn )
  • Install Ubuntu Image
  • Install SSH server
  • Install Putty and connect to Linux OS
Module-4C : Install Apache Spark PDF Download & Available Length 17 Minutes & HandsOn )
  • Install Apache Spark
  • Start spark-shell
  • Start pyspark
Module-5A : DataFrame and Dataset API Introduction PDF Download & Available Length 19 Minutes & Concepts )
  • DataFrame Introduction
  • Dataset Introduction
  • SparkSQL with DataFrame and Dataset
Module-5B : Exercise using RDD and DataFrame/Dataset API (PDF Download & Available Length 14 Minutes & HandsOn )

Module-6 : Json Dataset Exercise and Explanation step by step PDF Download & Available Length 21 Minutes & HandsOn )
  • Support for Hive Query Exercise
  • Loading JSON data
  • Explicitly assign Schema to Json Data
  • Working with loaded Json data
Module-7 : Spark SQL Encoders in Details PDF Download & Available Length 18 Minutes & Concepts )
  • Implicit Objects
  • Encoders (Special ser-de for SparkSQL)
  • Why Encoders are fast
  • Creating custom Encoders
Module-8 : Encoders Exercise PDF Download & Available Length 8 Minutes & HandsOn )

Module-9 : DataFrame and Dataset in Depth PDF Download & Available Length 18 Minutes & Concepts )
  • Dataset operation variants
  • Dataset and compile time check
  • Dataset transient values
  • Converting DataFrame to Dataset
  • Dataset Using Case classes
  • Dataset using Programmatic Schema
Module-10 : Dataset And DataFrame Exercises PDF Download & Available Length 18 Minutes & HandsOn )
  • Dataset API in three different formats
  • Go through Explain Plans
  • Dataset methods
  • Work with DataFrames
Module-11 : Apache SparkSQL Schema creation and understanding PDF Download & Available Length 24 Minutes & HandsOn )
  • SparkSQL StructType and StructField
  • Inferring Schema
  • Printing Schema in various ways
  • Nested StructType
  • Row object and accessing fields
Module-12A : DataFrameReader and DataFrameWriter PDF Download & Available Length 24 Minutes & HandsOn )
  • RowObject and RowSchema
  • RowEncoder
  • DataFrameReader Interface
  • DataFrameWriter Interface
  • Schema inference and compression
  • Dataset using Programmatic Schema
Module-12B : Apache SparkSQL Schema creation and understanding PDF Download & Available Length 28 Minutes & HandsOn )
  • RowObject and RowSchema
  • RowEncoder
  • DataFrameReader Interface
  • DataFrameWriter Interface
Module-13 : Dataset Caching and Checkpointing PDF Download & Available Length 30 Minutes & HandsOn )
  • Dataset Caching
  • Dataset Un-persisting
  • Dataset Eager checkpoint
  • Dataset Non-Eager checkpoint
  • Dataset Lineage truncation
  • Dataset Performance Improvements
  • DataFrame Caching
  • DataFrame Un-persist
  • Check the UI for storage
Module-14 : Dataset JOINS PDF Download & Available Length 25 Minutes & HandsOn )
  • Broadcast joins
  • Dataset joins
  • Dataset Joins and Hints
Module-15 : RelationalGroupedDataset PDF Download & Available Length 29 Minutes & HandsOn )
  • GroupBy operations on Datasets
  • Different types of aggregate functions
  • Dataset Union function
  • Grouping Sets example
Module-16 : SparkSQL Rollup, Cube and Pivot Operations PDF Download & Available Length 20 Minutes & HandsOn )
  • Rollup operations
  • Pivot Operations
  • Cube operations
Module-17 : Spark SQL Functions PDF Download & Available Length 20 Minutes & HandsOn )
  • Standard available functions
  • User defined functions (UDF)
  • Window Aggregate Function
  • Inline and Explicitly creating UDF
  • Understand UDF and Aggregate UDF functions
  • Define and Register UDF Functions
  • Define Aggregate UDF functions
  • Use custom created UDF Functions
Module-18 : SparkSQL Rank and Cumulative Distribution functions PDF Download & Available Length 12 Minutes & HandsOn )
  • Understanding and Hands on With Rank functions
  • Creating Windows Spec
  • Calculating Cumulative Distribution Values using CDF
Module-19 : Row Number and Lead-Lag window functions PDF Download & Available Length 12 Minutes & HandsOn )
  • Understanding of Row number functions
  • Understanding of Lead-Lag functions