Taught a Car to Drive (Without Telling It How)
The "Show, Don't Tell" Approach to AI
Most people think teaching a machine to do something looks like this: you gather thousands of examples of the right answer, label them carefully, and feed them to a model until it learns to copy what you did. That is called supervised learning. It works great for recognizing cats in photos or filtering spam emails.
But here is the problem. How do you teach a car to drive using that method? You would need hours of perfect driving data. Every lane change, every brake tap, every turn signal. And the car would only learn to copy those specific actions. If it encounters a situation slightly different from your training data – a construction zone, a kid chasing a ball into the street – it has no idea what to do.
Supervised learning is imitation. It is fancy mimicry. Reinforcement learning is different. You do not show the car what to do. You just tell it the goal: get from point A to point B without hitting anything. Then you let it figure out the rest through trial and error. Good actions get a small reward. Bad actions (crashes) get a penalty. Over time, the car builds its own understanding of what works.
This is closer to how humans actually learn. Nobody gave you a labeled dataset of every possible driving scenario when you turned sixteen. You got behind the wheel, made mistakes, and learned from them.
A Fun Practice Project (Not Production Ready, And That Is Fine)
I built a small self-driving car simulation as a way to learn reinforcement learning hands-on. This is not a real autonomous vehicle system – it is a 2D environment with simplified physics and fake LiDAR sensors. But that is exactly why it works for learning. You can train a model in minutes instead of months, and crashing just resets the simulation instead of totaling a real car.
The car sees the world through a sensor system that detects obstacles in different directions – think of it as a very basic version of the LiDAR sensors real self-driving cars use. It can also feel its own speed and heading. That is all the information it gets.
Then it takes an action: accelerate, brake, turn left, turn right. The environment updates the car's position, checks for collisions, and gives back a reward. Moving toward the target gives a small positive reward. Reaching the target gives a large reward. Hitting an obstacle gives a negative reward. Doing nothing gives nothing.
The learning algorithm is a Deep Q-Network or DQN. That is just a fancy way of saying "a neural network that guesses how good each possible action will be in the current situation." The network gets better over time because of something called experience replay – it stores memories of what happened, then randomly replays those memories to learn from them repeatedly.
![A simple 2D simulation showing a small red square (the car) navigating between blue circles (obstacles) toward a green target]
*A basic driving environment: the car (red) learns to avoid obstacles (blue) and reach the target (green).*
Why Reinforcement Learning Is More Interesting Than Supervised Learning
Here is the part that surprised me when I first learned this. Supervised learning is actually harder in many real-world situations, even though it is the more common approach.
Think about teaching a robot to walk. For supervised learning, you need a dataset of a walking robot. But you do not have one. That is what you are trying to create. It is a chicken-and-egg problem. For reinforcement learning, you just need a way to measure "is the robot still upright?" and "has it moved forward?" The robot figures out the walking part on its own.
The same applies to driving. No one has a complete dataset of every possible driving scenario. But with reinforcement learning, the car can train in simulation, crash ten thousand times in a single hour, and gradually learn policies that work. Those policies often end up looking nothing like human driving. The car might take weird curved paths that a human would never try. But if they work, they work.
Semi-supervised learning sits somewhere in the middle – a little bit of labeled data plus a lot of unlabeled data. That is useful when labeling is expensive but not impossible. For driving, though, labeling every possible edge case is impossible. So reinforcement learning is not just fancy. It is often the only practical choice.
![A graph showing reward increasing over training episodes, with jagged ups and downs gradually trending upward]
*Learning curve: rewards start low and chaotic, then slowly improve as the car figures out what works.*
What This Project Actually Taught Me
The car I built is not going to win any races. It learns to navigate simple obstacle courses in a 2D grid. It sometimes gets stuck in local optima – for example, it discovers that stopping completely avoids crashes, so it just never moves. Fixing that requires tuning the reward system to encourage progress.
But that is exactly the point. Reinforcement learning forces you to think carefully about what you are actually rewarding. If you reward getting to the target but do not penalize slow movement, the car learns meandering paths. If you penalize steering too harshly, the car drives straight into walls. Getting the reward function right is most of the work.
The project includes training visualization, so you can watch the car improve over time. Early episodes look like drunk driving. Later episodes look deliberate. The model can be saved and loaded, so you do not have to retrain from scratch every time.
Final Note
Reinforcement learning is not a replacement for supervised learning. They solve different problems. Supervised learning is for when you have examples of the right answer. Reinforcement learning is for when you only know how to measure success.
For a self-driving car, you cannot possibly label every situation in advance. But you can absolutely define what success looks like: reach the destination, avoid collisions, follow traffic rules. That makes reinforcement learning a natural fit. It is messy, it takes thousands of failed attempts, and it produces driving behavior that looks nothing like a human's. But it works.
This project was just for fun and practice. No real cars were harmed. But understanding reinforcement learning changed how I think about problems – and it might do the same for you. 😆