This is a course I have been teaching at IIT Delhi, IIT Jammu and have been regularly offering as a MOOC on NPTEL.
Lecture 0: OpenMP Course Intro [3 mins]
Lecture 1: Introduction to Parallel Programming [12 mins]
Lecture 2: Parallel Architectures and Programming Models [11 mins]
Lecture 3: Pipelining [9 mins]
Lecture 4: Superpipelining and VLIW [13 mins]
Lecture 5: Memory Latency [9 mins]
Lecture 6: Cache and Temporal Locality [15 mins]
Lecture 7: Cache, Memory bandwidth and Spatial Locality [12 mins]
Lecture 8: Intuition for Shared and Distributed Memory Architectures [17 mins]
Lecture 9: Shared and Distributed Memory Architectures [4 mins]
Lecture 10: Interconnection Networks in Distributed Memory Architectures (in brief) [15 mins]
Lecture 11: OpenMP: A parallel Hello World Program [5 mins]
Lecture 12: Program with a single thread [14 mins]
Lecture 13: Program Memory with Multiple threads and Multi-tasking [7 mins]
Lecture 14: Context Switching [12 mins]
Lecture 15: OpenMP: Basic thread functions [10 mins]
Lecture 16: OpenMP: About OpenMP [5 mins]
Lecture 17: Shared Memory Consistency Models and the Sequential Consistency Model [22 mins]
Lecture 18: Race Conditions [6 mins]
Lecture 19: OpenMP: Scoping variables and some race conditions [16 mins]
Lecture 20: OpenMP: thread private variables and more constructs [4 mins]
Lecture 21: Computing sum: first attempt at parallelization [9 mins]
Lecture 22: Manual distribution of work and critical sections [11 mins]
Lecture 23: Distributing for loops and reduction [7 mins]
Lecture 24: Vector-Vector operations (Dot product) [19 mins]
Lecture 25: Matrix-Vector operations (Matrix-Vector Multiply) [15 mins]
Lecture 26: Matrix-Matrix operations (Matrix-Matrix Multiply) [20 mins]
Lecture 27: Introduction to tasks [7 mins]
Lecture 28: Task queues and task execution [11 mins]
Lecture 29: Accessing variables in tasks [11 mins]
Lecture 30: Completion of tasks and scoping variables in tasks [9 mins]
Lecture 31: Recursive task spawning and pitfalls [11 mins]
Lecture 32: Understanding LU Factorization [15 mins]
Lecture 33: Parallel LU Factorization [14 mins]
Lecture 34: Locks [24 mins]
Lecture 35: Advanced Task handling [15 mins]
Lecture 36: Matrix Multiplication using tasks [5 mins]
Lecture 37: The OpenMP Shared Memory Consistency Model [15 mins]
Lecture Slides Set-1: Basic OpenMP Constructs
Lecture Slides Set-2: Work Sharing Constructs
Lecture Slides Set-3: Shared Memory Consistency Models
Lecture Slides Set-4: OpenMP Locks