Deep Q Networks and Robotic Arms to play Connect Four
by Sam Zlota
Final Project for CS4910 Special Topics in Computer Science taught by David Klee
Code: https://github.com/sam-zlota/robo-connectfour
Final Project for CS4910 Special Topics in Computer Science taught by David Klee
Code: https://github.com/sam-zlota/robo-connectfour
Action Space: There are 7 actions [0,1,2,3,4,5,6,7]
State Space: The board is represented by 6x7 matrix where each element is either empty, taken by player 1 or player 2. There are more than 4 trillion legal states
Reward: -1 for loss, 1 for win
NN Architecture: MLP with 2 hidden layers
Observation Representation:
Using the board state is not the best for training as it does not capture which players' turn it is.
Observation:
3 * 6 * 7 tensor
First channel is 6*7 matrix of 0 or 1, if first player has that spot
Second channel is 6*7 matrix of 0 or 1, if second player has that spot
Third channel is 6*7 matrix of -1 or 1, -1 for second player, 1 for first player
Training Details:
Exploration
Some experiments were run where the agent took random moves with a certain probability
Demonstration
Some experiments were run where the agent took expert moves with a certain probability. This allowed for faster convergence.
Evaluation Metrics
Optimal play does not necessarily equate to 100% success. An optimal player going first should always win, but an optimal player going second will lose sometimes. I used metrics such as game length and draw rate to see if the agent was learning defensive tactics
Experiment 1: Baseline Results
250,000 steps
25,000 replay buffer size
Epsilon greedy exploration 0.99->0.5
Experiment 2: Effect of longer training
1,250,000 steps
125,000 replay buffer size
Epsilon greedy exploration 0.99->0.5
Experiment 3: Effect of Demonstration
1,250,000 steps
125,000 replay buffer size
Epsilon greedy demonstration 0.99->0.5
Experiment 4: Demonstration with Longer Training
1,700,000 steps before crashing
150,000 replay buffer size
Epsilon greedy demonstration 0.95->0.7
DQN Agent (YELLOW) goes first and plays Expert Agent (RED)
The DQN Agent optimally takes the middle column as its first move.
It also is able to take a simple strategy of building a tower off the bat. This may not be optimal but it is still intelligent.
This agent was trained for 1.7 Million steps against an expert agents demonstrations
Robotic Setup
Proof of concept:
Hard code joint angles for each action (drop in column) from a fixed position relative to board
Fixed grasp procedure
Future stages:
Incorporate camera to read game state and trigger moves
Incorporate camera to learn dynamic grasps
Initial Challenges
Obstacle avoidance
Robot needed to elevated
Board Columns too thin