Home
Welcome to the HadoopExam Spark 2.x SQL Professional Training with HandsOn Sessions.
To Access this training , you must Have Subscription from www.HadoopExam.com
Please check Here for getting full training access
This page is mainly for Spark 2.x SQL Professional Training.
Using SignIn, to login with your permitted email Id
Use the Pedagogy Navigation to watch Individual Problem and Solutions Video
Syllabus Covered as Part of This training (Become an Spark 2.x SQL Expert in around 8+ hours training)
Module-1 : Introduction to Spark SQL ( PDF Download & Available Length 37 Minutes)
What is new in Spark SQL
What are the main Goals of SparkSQL
Start of Dataset/DataFrame API
Data Pipelines
Introduction to UDF
Module-2 : Introduction Apache Spark SQL's Catalyst optimizer ( PDF Download & Available Length 38 Minutes)
What is Catalyst optimizer
Concepts of Tree and Rules
Various Phases of Catalyst optimizer
Analysis
Logical optimization
Physical planning
Code Generation
Scala Features concepts
Predicate Pushdown
Constant Folding
Physical operator
Project Prunning
Module-3 : Introduction to Project Tungsten ( PDF Download & Available Length 31 Minutes)
Purpose of project Tungsten
Binary Processing
Cache Aware Computation
Code Generation
Custom Memory Management
Module-4 : Create Spark Environment using Ubuntu Linux on Windows
Module-4A : Step by Step Installation of VMWARE Workstation Player ( PDF Download & Available Length 8 Minutes & HandsOn )
Module-4B : Install Ubuntu Linux in VMWare Player ( PDF Download & Available Length 23 Minutes & HandsOn )
Install Ubuntu Image
Install SSH server
Install Putty and connect to Linux OS
Module-4C : Install Apache Spark ( PDF Download & Available Length 17 Minutes & HandsOn )
Install Apache Spark
Start spark-shell
Start pyspark
Module-5A : DataFrame and Dataset API Introduction ( PDF Download & Available Length 19 Minutes & Concepts )
DataFrame Introduction
Dataset Introduction
SparkSQL with DataFrame and Dataset
Module-5B : Exercise using RDD and DataFrame/Dataset API (PDF Download & Available Length 14 Minutes & HandsOn )
Module-6 : Json Dataset Exercise and Explanation step by step ( PDF Download & Available Length 21 Minutes & HandsOn )
Support for Hive Query Exercise
Loading JSON data
Explicitly assign Schema to Json Data
Working with loaded Json data
Module-7 : Spark SQL Encoders in Details ( PDF Download & Available Length 18 Minutes & Concepts )
Implicit Objects
Encoders (Special ser-de for SparkSQL)
Why Encoders are fast
Creating custom Encoders
Module-8 : Encoders Exercise ( PDF Download & Available Length 8 Minutes & HandsOn )
Module-9 : DataFrame and Dataset in Depth ( PDF Download & Available Length 18 Minutes & Concepts )
Dataset operation variants
Dataset and compile time check
Dataset transient values
Converting DataFrame to Dataset
Dataset Using Case classes
Dataset using Programmatic Schema
Module-10 : Dataset And DataFrame Exercises ( PDF Download & Available Length 18 Minutes & HandsOn )
Dataset API in three different formats
Go through Explain Plans
Dataset methods
Work with DataFrames
Module-11 : Apache SparkSQL Schema creation and understanding ( PDF Download & Available Length 24 Minutes & HandsOn )
SparkSQL StructType and StructField
Inferring Schema
Printing Schema in various ways
Nested StructType
Row object and accessing fields
Module-12A : DataFrameReader and DataFrameWriter ( PDF Download & Available Length 24 Minutes & HandsOn )
RowObject and RowSchema
RowEncoder
DataFrameReader Interface
DataFrameWriter Interface
Schema inference and compression
Dataset using Programmatic Schema
Module-12B : Apache SparkSQL Schema creation and understanding ( PDF Download & Available Length 28 Minutes & HandsOn )
RowObject and RowSchema
RowEncoder
DataFrameReader Interface
DataFrameWriter Interface
Module-13 : Dataset Caching and Checkpointing ( PDF Download & Available Length 30 Minutes & HandsOn )
Dataset Caching
Dataset Un-persisting
Dataset Eager checkpoint
Dataset Non-Eager checkpoint
Dataset Lineage truncation
Dataset Performance Improvements
DataFrame Caching
DataFrame Un-persist
Check the UI for storage
Module-14 : Dataset JOINS ( PDF Download & Available Length 25 Minutes & HandsOn )
Broadcast joins
Dataset joins
Dataset Joins and Hints
Module-15 : RelationalGroupedDataset ( PDF Download & Available Length 29 Minutes & HandsOn )
GroupBy operations on Datasets
Different types of aggregate functions
Dataset Union function
Grouping Sets example
Module-16 : SparkSQL Rollup, Cube and Pivot Operations ( PDF Download & Available Length 20 Minutes & HandsOn )
Rollup operations
Pivot Operations
Cube operations
Module-17 : Spark SQL Functions ( PDF Download & Available Length 20 Minutes & HandsOn )
Standard available functions
User defined functions (UDF)
Window Aggregate Function
Inline and Explicitly creating UDF
Understand UDF and Aggregate UDF functions
Define and Register UDF Functions
Define Aggregate UDF functions
Use custom created UDF Functions
Module-18 : SparkSQL Rank and Cumulative Distribution functions ( PDF Download & Available Length 12 Minutes & HandsOn )
Understanding and Hands on With Rank functions
Creating Windows Spec
Calculating Cumulative Distribution Values using CDF
Module-19 : Row Number and Lead-Lag window functions ( PDF Download & Available Length 12 Minutes & HandsOn )
Understanding of Row number functions
Understanding of Lead-Lag functions