Syllabus:
Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks.
U Dinesh Kumar, Business Analytics: The Science of Data(Driven Decision Making), Wiley publication, 1st Edition 2017.
Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
David Silver, Lecture Resource on Introduction of Deep reinforcement Learning, (deepmind site)
Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman
Stochastic Approximation: A Dynamical Systems Viewpoint by Vivek Borkar
Neuro-Dynamic Programming by Dimitri Bertsekas and John Tsitsiklis
Markov Chains and Mixing Times by David Asher Levin, Elizabeth Wilmer, and Yuval Peres
Theses:
Safe Reinforcement Learning by Philip Thomas
Breaking the Deadly Triad in Reinforcement Learning by Shangtong Zhang
Actor-Critic Algorithms by Vijaymohan Konda
Notes:
Introduction to discrete-time Markov chains I by Karl Sigman
Markov chains II: recurrence and limiting (stationary) distributions by Karl Sigman
Probability-
Part-3
Introduction to Markov Process
Q learning Example, Function Approximation Example
Updated Slides 2024
Unit-1 Introduction: Course logistics and ov erv iew. Origin and history of Reinforcement Learning research. Its connection s with other related fields and with different branches of machine learning. Pr obability Primer Brush up of Probability concepts - Axioms of pr obability , concepts of random v ariables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, con ditional and marginal distributions. Correlation and independence.
Unit-2 Markov Decision Pr ocess: Intr oduction to RL terminology , Markov pr operty , Markov chains, Markov reward pr ocess (MRP). Intr oduction to and pr oof of Bellman equations for MRPs a long with proof of existence of solution to Bellman equation s in MRP. Intr oduction to Markov decision pr ocess (MDP), state and action v alue functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.
Unit-3 Prediction and Control by Dy namic Pr ogramming: Ov erv iew of dy namic pr ograming for MDP, definition and formulation of planning in MDPs, principle of optimality , iterativ e policy ev aluation, policy iteration, v alue iteration, Banach fixed point theorem, proof of contraction mapping pr operty of Bellman expectation and optimality operator s, proof of conv ergence of policy ev aluation and v alue iteration alg orithms, DP extensions. Monte Carlo Meth ods for Model Free Prediction and Control Ov erv iew of Monte Carlo methods for model free RL, Fir st v isit and ev ery v isit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.
Unit-4 Function Approximation Methods: Function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, After states, Control with function approximation, Lea t squares, Experience replay in deep Q-Networks. Policy Gradients Getting started with policy gradient methods, Log -derivative trick, Naive Reinforce algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor -critic methods
Reinforcement Learning Interview Questions
OPTED FROM
https://www.mlstack.cafe/interview-questions/reinforcement-learning
Q1: What is Reinforcement Learning? How does it compare with other ML techniques?
Q2: How to define States in Reinforcement Learning?
Q3: Name some approaches or algorithms you know in to solve a problem in Reinforcement Learning
Q4: Provide an intuitive explanation of what is a Policy in Reinforcement learning
Q5: What are the steps involved in a typical Reinforcement Learning algorithm?
Q6: What is Markov Decision Process?
Q7: What is the difference between Off-Policy and On-Policy Learning?
Q8: What is the difference between a Reward and a Value for a given State?
Q9: What is the role of the Discount Factor in Reinforcement Learning?
Q10: Are there any problems when using the Epsilon-Greedy method to find the Optimal Policy?
Q11: Can the Monte Carlo Method be applicable to all tasks?
Q12: Can you think of an example of an Epsilon-Greedy Policy in real life?
Q13: Compare Reinforced Learning and Supervised Learning
Q14: How does the Monte Carlo prediction method compute the Value Function?
Q15: How to choose the values of Gamma and Lambda in generalised temporal differencing algorithms?
Q16: Name some advantages of using Temporal difference vs Monte Carlo methods for Reinforcement Learning
Q17: What type of Neural Networks do Deep Reinforcement Learning use?
Q18: What types of Reinforcement Learning Environments do you know?
Q19: What's the difference between Q-Learning and Policy Gradients methods?
Q20: What's the difference between a Deterministic vs Stochastic policy?
Q21: Why would you use a Deep Q-Network?
Q22: Why would you use a Policy-based method instead o a Value-based method?
****************************
Q1: What is Reinforcement Learning? How does it compare with other ML techniques?
Q2: What is Markov Decision Process?
Q3: Provide an intuitive explanation of what is a Policy in Reinforcement learning
Q4: What is the role of the Discount Factor in Reinforcement Learning?
Q5: Name some approaches or algorithms you know in to solve a problem in Reinforcement Learning
Q6: How to define States in Reinforcement Learning? Related To: Q-Learning
Q7: What is the difference between a Reward and a Value for a given State?
Q8: How do you know when a Q-Learning Algorithm converges? Related To: Q-Learning
Q9: What does a Stationary Dynamics and Stationary Policy mean in the context of Reinforcement Learning?
Q10: What are the steps involved in a typical Reinforcement Learning algorithm?
Q11: What is the difference between Off-Policy and On-Policy Learning?
Q12: What do the Alpha and Gamma parameters represent in Q Learning? Related To: Q-Learning
Q13: What type of Neural Networks do Deep Reinforcement Learning use? Related To: Neural Networks
Q14: Compare Reinforced Learning and Supervised Learning Related To: Supervised Learning
Q15: What's the difference between a Deterministic vs Stochastic policy?
Q16: How does the Q function differ from the Value function in Reinforcement Learning?
Q17: What is the difference between Q-Learning and SARSA and when would you use each one? Related To: Q-Learning
Q18: Can you think of an example of an Epsilon-Greedy Policy in real life?
Q19: What types of Reinforcement Learning Environments do you know?
Q20: What's the advantage of using Policy Iteration vs Value iteration?
Q21: Can the Monte Carlo Method be applicable to all tasks?
19: What types of Reinforcement Learning Environments do you know?
Q20: What's the advantage of using Policy Iteration vs Value iteration?
Q21: Can the Monte Carlo Method be applicable to all tasks? Related To: Monte Carlo Method
Q22: How to distinguish Episodic Tasks vs Continuous Tasks?
Q23: How does the Monte Carlo prediction method compute the Value Function? Related To: Monte Carlo Method
Q24: What types of Monte Carlo Prediction Algorithms do you know? Related To: Monte Carlo Method
Q25: Name some advantages of using Temporal difference vs Monte Carlo methods for Reinforcement Learning Related To: Monte Carlo Method
Q26: Name some advantages of using Monte Carlo vs Dynamic Programming methods in Reinforcement Learning Related To: Monte Carlo Method
Q27: Why would you use a Policy-based method instead o a Value-based method?
Q28: Why would you use a Deep Q-Network?
Q29: What is the difference between episode and epoch in Deep Q-Learning? Related To: Q-Learning
Q30: Are there any problems when using REINFORCE to obtain the optimal policy?
Q31: What's the difference between Learning Rate Decay and Epsilon Decay? What is the context of each one?
Q32: Are there any problems when using the Epsilon-Greedy method to find the Optimal Policy?
Q33: What's the difference between a Deep Q-Network and a categorical Deep Q-Network? Related To: Q-Learning
Q34: How to choose the values of Gamma and Lambda in generalised temporal differencing algorithms?
Q35: Can Q-learning be used for continuous (state or action) spaces? If not, then what would you use? Related To: Q-Learning
Q36: What's the difference between Q-Learning and Policy Gradients methods? Related To: Q-Learning
Q37: Can you apply Value Iteration and Policy Iteration in any environment?
Q38: What's the difference between Deep Q-Learning and Policy Gradient Method? Related To: Q-Learning
Q39: What is Sample Efficiency, and how can Importance Sampling be used to achieve it?
Q40: What is the difference between vanilla policy gradient (VPG) with a baseline as value function and advantage actor-critic (A2C)?
Q41: What are some best practices when trying to design a Reward Function?
Q42: How can policy gradients be applied in the case of multiple continuous actions?
Q43: When would you use a Deep Recurrent Q-Network? Related To: Deep Learning
Q44: Is the optimal policy always Stochastic if the environment is also Stochastic?
Q45: How does a Double Deep Q-Network differ from a Deep Q-Network?
Q46: Are there any problems when using a Softmax Function to select actions in a Deep Q-Network?
Q47: Why do we need the target network in a Deep Q-Network? Related To: Q-Learning
Q48: What are some advantages of Quantile Regression DQN over Categorical DQN? Related To: Q-Learning
Q49: What is the effect of Parallel Environments in Reinforcement Learning?
Q50: How does the Actor-Critic method differ from the Policy Gradient with the Baseline method?
Q51: What is Experience Replay and what are its benefits?
Q52: Why do regular Q-Learning and DQN overestimate the Q values?
Q53: What's the difference between Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C)? Related To: Q-Learning
Q54: Can SARSA be used in a Partially Observable Markov Decision Process? If yes (or not), why?
Assignment 1
Assignment 2