CS339-2023

Collaborative Robotics - Stanford University - 2023 - Team 4

Team Members

Saksham Consul

Evan Samuel Hymanson

Carlota Parés-Morlans

Paula Stocco

Problem Statement

The goal of the project was to program a robot for a joint dual robot resource gathering game. Both robots in the field are Trossen Robotics's LoCoBots. The game begins with the 2 LoCoBots placed randomly in the field containing randomly scattered red, blue, green, and yellow blocks. Each robot is provided with the information of the 4 possible base stations, the configuration of the blocks to be kept at each base station, as well as the color of blocks each robot is allowed to pick up.

We used the LoCoBot RGBD camera Intel RealSense camera for understanding the map (deciphering the locations of the blocks), and obtained information of the other LoCoBot's location (pose) from the Motiv Motion Capture system. Not only did we have to implement single-agent autonomy in the robot, which involved perception, locomotion (both base, arm, and gripper), it also had a collaborative aspect to it, where we had to come up with a strategy to select the 3 base stations from the possible 4 locations, without digitally communicating with the other LoCoBot.

Methods

Autonomy

The diagram below shows the coding framework containing our see-think-act cycles.

Perception begins with Camera input, from which the robot obtains the colored block coordinates and populated an occupancy grid of the floor. MoCap is used to identify own location, and the location of other robots on the field.

Planning is done within the 'Goal Meta Node', where the next goal is selected. This is either the next block to be picked up or which station to drop off the block currently being held.

Control method is chosen with the Brain node, depending the current state shown with transitions below below.

Anytime the replan function is called a new collision free trajectory is found using A* and then smoothed.
While in MOVE state the robot obtains a goal position based on on the current time it is along the planned back, and uses a differential flatness controller to compute the linear and angular velocities for control.
If at the beginning of the trajectory and unaligned to the path, if replan is called at any point and the robot is unaligned with the target heading, or if the robot reaches the target goal then the robot moves into the ALIGN state. There the robot uses a heading control to correct the control angle.
During the PICK and DROP states the Interbotix MoveIt package is used to command the arm and gripper

Autonomy Stack Implementation

Perception:

Cube Detection: a C++ function that returns the position and color of the different cubes found
Occupancy Grid: to observe and map surrounding environment for any obstacles, cubes to be chosen, opposing LoCoBots, and drop off locations for varying stations.
Map Dilation: to account for width of objects
April Tag Detection in Gazebo with according ROS package
Visuals in Rviz: LoCoBot planned path, markers for other LoCoBots, markers for each cube, detected, along with color detected, map dilation, and april detection markers

AprilTag Detection in Gazebo

Collaboration

Given:

Target configuration
Permissible color of blocks that can be picked up
Location of 4 possible base station locations

Target Configuration

To approach this collaboration problem, we first adopted the following assumptions:

The number of blocks in the world is equal to the number of blocks needed in the locations, i.e, there are no extra blocks
Blocks won’t move in the map, unless picked up, i.e, blocks aren’t displaced due to collision or external forces
Once a block has been picked and placed on a location, it will not be picked up again
A block picked up, will be dropped at a location, i.e, the robot will not the drop midway and leave it
There is no overlap in the color of blocks that the robots are allowed to pick up
The other LoCoBot can observe the world same as us, i.e, no partial observability
The time taken to pick and drop block is the proportional to distance
Robots will not collide with each other

Formulation

We adopt a hierarchical Bayesian approach to solve this 2 agent problem. We formulate the problem as a Semi-Markov Decision Process with unspecified rewards.

State: The state space S consists of all possible combinations of blocks in the 4 possible locations. A state can be represented as a 4 × 4 matrix

Initial State

Action: An action is the act of picking up a particular block and dropping it to a base station. Hence, the size of the action space |A| is equal to the number of blocks in the map. Additionally, since each robot can only pick some certain colors, which are disjoint to each other, the action space can be separated into A1, A2, where A1 ∪ A2 = A, and A1 ∩ A2 = ∅

Transition time: We model F to be the time taken to pick and place the block corresponding to the action to a base station. This is proportional to the distance of the block from the current location of the robot and the distance of the block to the selected location. The initial belief is set to be an uniform distribution.

Belief: Is the probability of a location being a base station. In our problem, we have 4 possible locations, and 3 possible locations. So belief at time t is a 4 × 3 matrix.

Initial Belief

Algorithm

Explore the world by spinning in 360 degrees in place (from initial location). Since the map, is free of any walls, rotating in place would provide complete information of the world.
Once the robot finishes its action, it will check if finished its action before the other robot. If so, it adopts the role of a leader, assuming that the other robot would adopt a follower strategy. Conversely, if it finishes second, it will assume the follower strategy. In the beginning, the action is the exploration, in future steps, it would be the action of picking and dropping a cube.

(a) As a leader: the robot will select a cube to put to a base station that results in maximum change in the shared belief of the world

(b) As a follower: the robot will select a cube to put to a base station resulting in minimum change in shared belief of the world

Repeat step 2, until the desired configuration is reached (success), or if the configuration leads to an inconsistent shared belief (failure)

When a robot has to select a cube to be placed at a location, it asks the following questions which have to be answered.

What's my belief to the mapping of locations to base stations?
Which color cube should I pick up?
Which cube of the color decided should I pick up?

The high level controller answers the second question. Depending on what role the robot is adopting, using the information from the state, and current belief, to select the color from the list of allowed colored cubes that the robot can pick up, that will change the belief of the robot the most or the least, by conducting a 1-step rollout.
The meta-controller checks that the color selected, and resultant future belief, does not contradict the configuration requirements of the task.
The low-level controller answers the third question. Now that the high level controller has presented a valid color, the robot has to select a cube of that color. In our algorithm, the cube of the specified color that minimizes the expected time taken to pick and drop the cube is selected.

Bayesian Update

Bayes update can be applied as follows:

Which simplifies to:

The belief is then normalized. Additionally, since we assume that there are no extra blocks in the map, if 3 locations have blocks placed in them, then automatically, we set the probability of the empty location to be a base station to be 0.

Results

We show that for the case where both robots follow the above collaborative strategy, the target configuration is achieved no matter which robot begins or the placement of the block.

Case 1: Robot 1 is the first to complete exploration, and assumes leadership role, Robot 2 assumes follower role at t=0.

Action Sequence:

T = 0
Robot_1 chooses to put cube 3 which is green to goal A which it assumes is location 0

Robot_2 chooses to put cube 5 which is blue to goal A which it assumes is location 0

T=17

Robot_2 chooses to put cube 8 which is yellow to goal A which it assumes is location 0

T=26

Robot_1 chooses to put cube 4 which is green to goal C which it assumes is location 1

T=28

Robot_2 chooses to put cube 9 which is yellow to goal B which it assumes is location 2

T=52

Robot_1 chooses to put cube 2 which is red to goal A which it assumes is location 0

T=55

Robot_2 chooses to put cube 7 which is blue to goal B which it assumes is location 2

T=65

Robot_1 chooses to put cube 0 which is red to goal A which it assumes is location 0

T=79

Robot_2 chooses to put cube 6 which is blue to goal C which it assumes is location 1

T=82

Robot_1 chooses to put cube 1 which is red to goal B which it assumes is location 2

Change in the state, belief, and configuration in time

State

Belief

Configuration

Case 2: Robot 2 is the first to complete exploration, and assumes leadership role, Robot 1 assumes follower role at t=0.

Action Sequence

T = 0
Robot_1 chooses to put cube 2 which is red to goal A which it assumes is location 0

Robot_2 chooses to put cube 8 which is yellow to goal A which it assumes is location 0

T=11

Robot_2 chooses to put cube 9 which is yellow to goal B which it assumes is location 2

T=13

Robot_1 chooses to put cube 3 which is green to goal C which it assumes is location 1

T=38

Robot_2 chooses to put cube 5 which is blue to goal A which it assumes is location 0

T=39

Robot_1 chooses to put cube 0 which is red to goal A which it assumes is location 0

T=55

Robot_2 chooses to put cube 7 which is blue to goal B which it assumes is location 2

T=56

Robot_1 chooses to put cube 4 which is green to goal A which it assumes is location 0

T=79

Robot_2 chooses to put cube 6 which is blue to goal C which it assumes is location 1

T=83

Robot_1 chooses to put cube 1 which is red to goal B which it assumes is location 2

Change in the state, belief, and configuration in time

State

Belief

Configuration

The problem devolves into a turn-based game, due to the SMDP structure of the strategy. This behavior emerges from the algorithm itself and is not hard-coded.

Simulation

Video showing the simulation of the LoCoBot exploring the map in-place, and then navigating to a cube using A-star, picking a block, and then placing it to a base-station location

Open Source Code

Single-robot autonomy code: https://github.com/saksham36/Collaborative\_Robotics/tree/project

Collaborative strategy code: https://github.com/saksham36/Collaborative\_Robotics/tree/collab_strategy

Future Goals

Given more time, we would have started by tackling the following:

Integrate the collaborative strategy into the LoCoBot simulation, and verify it's convergence
Introduce a secondary LoCoBot in simulation, first running randomly, and secondly running collaboratively
Implement various other collaborative strategies for the secondary LoCoBot to test the robustness of our Bayesian collaborative strategy when interacting with other agents not following the same strategy
Transfer our implementation to the real world
Verify the collaborative strategy in the real world setting