Home
Welcome to the HadoopExam PySpark Structured Streaming Professional Training with HandsOn Sessions.
To Access this training , you must Have Subscription from www.HadoopExam.com
Please check Here for getting full training access
Using SignIn, to login with your permitted email Id
Use the Pedagogy Navigation to watch Individual Problem and Solutions Video
Syllabus Covered as Part of This training (Become an Spark Structured Streaming Expert in around 8+ hours training) : 10 Hands On Exercises Covering all the concepts
Module-1: Spark Streaming in Depth Part-1 ( PDF Download & Available Length 26 Minutes)
Real/Near real time data processin
Streaming Sources and Sinks
Streaming Concepts
Stock Visualization Example (How Streaming Helpful)
Module-2: Apache Spark Introduction DataFrame ( PDF Download & Available Length 21 Minutes)
DataFrame
DataFrame v/s Dataset
Sample API for DataFrame
Language Independent Catalyst Optimizer
Module-3: Introduction Apache Spark Catalyst optimizer ( PDF Download & Available Length 38 Minutes)
What is Catalyst optimizer
Concepts of Tree and Rules
Various Phases of Catalyst optimizer
Analysis
Logical optimization
Physical planning
Code Generation
Predicate Pushdown
Constant Folding
Physical operator
Project Prunning
Module-4 : Introduction of Structured Streaming ( PDF Download & Available Length 38 Minutes)
Purpose : How it differs from other Streaming Solutions
Anatomy of Structured Streaming Application
- Source
- Input Table
- Transformation
- Result Table
- Sink
Catalyst Optimization of Streaming Application
Module-5 : Programming concepts of Structured Streaming ( PDF Download & Available Length 11 Minutes)
Programming and Basic Concepts of Structured Streaming
Discussion with Pseudo Code
Some important points
Module-6 : Structured Streaming is different and less painful ( PDF Download & Available Length 19 Minutes)
Event Time v/s Processing Time
Understanding of Late Processing
Exactly once processing
Re-playable sources and Idempotent sink
Module-7: Structured Streaming Essentials ( PDF Download & Available Length 16 Minutes)
Common Issues with Legacy DStream solution
Essentials for Structured Streaming
Introduction to Triggers
Introduction to Watermarks
Revisit the concepts Learned
Assumptions for Structured Streaming
Module-8A : Install VMWARE Workstation Player ( PDF Download & Available Length 8 Minutes) : Env Setup for Hands On
Module-8B : Install Ubuntu Linux in VMWare Player ( PDF Download & Available Length 23 Minutes) : Env Setup for Hands On
Install Ubuntu Image
Install SSH server
Install Putty and connect to Linux OS
Module-8C : Install Apache Spark ( PDF Download & Available Length 17 Minutes) : Env Setup for Hands On
Install Apache Spark
Start spark-shell
Start pyspark
Module-9 : Update Spark Installed version ( PDF Download & Available Length 8 Minutes) : : Env Setup for Hands On
Install latest version of Spark
Module-10 : Sample Streaming Exercise ( PDF Download & Available Length 9 Minutes) : Hands On (1-Exercise)
Module-11 : Sample Streaming Exercise ( PDF Download & Available Length 20 Minutes) : Hands On (3-Exercises)
Reading from a Directory and Display on the console
Reading from a Directory and use SQL query operations
Aggregation query
Module-12: Late Event and Watermark ( PDF Download & Available Length 30 Minutes)
Late Data and Watermarks
Common operations on Streaming Data
Understanding of Window Operations
Window and Group By Operations
Module-13: Output Modes ( PDF Download & Available Length 25 Minutes)
Append Mode
Append Mode and Watermarks
Append Mode and Aggregations
Append Modes Guaranteed Data Processing
Complete Mode
Complete Modes and Triggers
Complete Mode + Watermark + Aggregations
Update Modes
Update Mode and Watermark
Module-14: Process JSON data & Output as a Parquet file ( PDF Download & Available Length 21 Minutes) : Hands On : (1-Exercise)
Module-15: Watermark and Output modes Hands on Exercise ( PDF Download & Available Length 59 Minutes) : Hands On (1-Big Exercise, covering 5 concepts)
Module-16: Window and multi-key aggregations Hands On Exercise ( PDF Download & Available Length 20 Minutes) : Hands On (1-Exercise)
Module-17: Processing JSON Data ( PDF Download & Available Length 30 Minutes) : Hands On (1-Big Exercise, covering 2 concepts)
Json Data processing
File Triggers
Memory sink
Stream status
Static Data Join with Stream
Module-18: Inner and Outer Joins ( PDF Download & Available Length 33 Minutes)
Stream-static joins
Stream-Stream join challenges
Inner Join
Inner Join and watermark
Outer Join and Watermark
Outer Join Important points
Module-19: Inner and Outer Joins ( PDF Download & Available Length 20 Minutes) : Hands On (1-Big Exercise, covering 2 concepts)
Stream-static joins
Stream-Stream join challenges
Inner Join
Inner Join and watermark
Outer Join and Watermark
Outer Join Important points
Module-20: Drop Duplicate data ( PDF Download & Available Length 11 Minutes)
Remove duplicate data (unbounded)
Remove duplicate data (bounded)
Module-21: Drop Duplicate data ( PDF Download & Available Length 11 Minutes) : Hands On : (1-Exercise)
Remove duplicate data (unbounded)
Remove duplicate data (bounded)
Module-22: Structured Streaming: Multiple Streams ( PDF Download & Available Length 11 Minutes)
Global Watermark
Foreach and Foreachbatch
Triggers
One Time Batch (Cost saving optimization)
Monitoring operations on Structured Streaming