Welcome to Practical RL

Welcome to the course!

We'll update this page with information about the course as we go.

Expectations

You can edit this document to tell us about your expectations for the course. 

Feedback

You can give feedback about this course while the course is happening. Use this link to provide feedback. You will need a google id and you can update your answer as we go. 

So far (as of Tuesday night) we have some feedback (3 responses) that was positive about OpenAI Gym but negative about the intro and other feedback that was positive about the whole course! Some like more hands-on programming, others seem to prefer more theory. Will try to have more of both! Please keep the feedback coming.

Vote for your demo

Use this link to vote

Slide materials

Slides are available for reference at https://github.coventry.ac.uk/aa3172/rl-course 

The intro to RL slides are at https://github.coventry.ac.uk/aa3172/presentations/tree/main/2023-02-rl 

Photos

Photos from the event are in a onedrive folder.

Notes

Monte Carlo code

On Tuesday night, we left with our monte carlo routine having a bug. Hopefully you found the immediate problem. However, I also asked GPT-4 for an opinion about the code. I thought the answer was insightful (apart from telling us we should be using gym and not gymnasium, so here it is:

Yes, there are a few bugs and potential issues in this code.

1. The first error is in the import statement. `gym` is the correct library for reinforcement learning environments, not `gymnasium`. 

    Change `import gymnasium as gym` to `import gym`.

2. In the line `Returns = [] * N_S`, you are trying to create a list of lists. But it doesn't create a list of empty lists, it creates an empty list instead. For creating a list of lists you should use list comprehension.

    Change `Returns = [] * N_S` to `Returns = [[] for _ in range(N_S)]`.

3. In this line: `observation, reward, terminated, truncated, info = env.step(int(a))`, the standard `gym` environment's `step` function only returns four values: `observation, reward, done, info`. So trying to unpack these into five variables will lead to an error.

    Change this line to `observation, reward, done, info = env.step(int(a))`, and replace all instances of `terminated` and `truncated` with `done` in the code.

4. In the line `G = GAMMA * G + reward[i]`, you're trying to access the i-th reward with `reward[i]`, but `reward` is a float number, not a list. You should replace `reward[i]` with `rewards[i]`.

5. There is a potential off-by-one error in the for loop. The `range(i)` does not include `i` itself, yet the last step in the episode should be included in the return calculation. 

    Change `for j in range(i):` to `for j in range(i+1):`.

6. Lastly, there's no implementation of a policy. The function `first_visit_mc` takes `policy` as an argument, and it is assumed to be a function or an array that given an observation (or state) provides an action. If it's not a function, and instead an array or list, the dimensions and value ranges of this object should be validated.

About your trainers

James Brusey is a Professor of Computer Science and co-Director of the Centre for Computer Science and Mathematical Methods at Coventry University, leading on AI for Cyberphysical Systems. His current research is in Machine Learning, Reinforcement Learning, and applied wireless networked sensing. Recent industrially-focused projects involve vehicle thermal comfort (with JLR), residential building and thermal comfort (Orbit Group), particle filter algorithms for flow measurement (with TUV-NEL), a new form of linear discriminant analysis (TUV-NEL), algorithms for network packet reduction (EU-funded STARGATE project with Rolls-Royce), decision support for buildings monitoring, elderly / infirm falls and near-falls sensing.

He provides thought leadership to the virtualisation framework for the EU H2020 DOMUS project (involving Fiat (CRF), Toyota (TME), and Volvo among others) that aims to revolutionise thermal comfort systems for electric vehicles.

James Brusey received his PhD from RMIT University in 2003. He has over 15 year's experience in the IT industry, part of which was as an independent consultant. He has taught research methods to all levels of researchers around the world and teaches practical courses in wireless sensing and Internet of Things to postgraduate students. He is well published and cited, has graduated 20 PhD students, and received 25 grants with a total value of £35mil. His professorship was awarded in 2018.

Dr. Khawaja Fahad Iqbal is an Assistant Professor and Post-Graduate Program Coordinator at the Department of Robotics and Artificial Intelligence, National University of Sciences and Technology (NUST), Pakistan. He is also serving as the Co-Principal Investigator of Intelligent Robotics Lab (IRL) at the National Center of Artificial Intelligence (NCAI), Pakistan. He recieved his PhD in Robotics in 2022 and MS in Bioengineering and Robotics degrees in 2017 from Tohoku University, Japan. He was awarded the Japanese Government (MEXT) scholarship for both his MS and PhD. His research interests include Motion Planning, Collaborative Robots, Reinforcement Learning, and Simulataneous Localization and Mapping (SLAM).