Implementation

How our complete system works

There are three main packages that our system relies on to solve the maze.

/usb_cam: The usb_cam package reads in images from the usb cam and publishes them to the /usb_cam/image_raw topic.

/maze_solver: The maze_solver package is responsible for the CV and path planning described on the design page of this website. It subscribes to the /usb_cam/image_raw topic, processes the image and does planning, and then publishes the position of the ball and the next target position the controller should move it to the /maze_info topic. This information is published in the form of a maze_message.

/control_pkg: The control_pkg subscribes to the /maze_info topic to retrieve the position of the ball and the next desired position of the ball. It then calculates the angles needed to move the ball to this position using PID and physics, and uses the limb library to actuate the sawyer bot so that those joint angles are achieved.

Let's now draw the connection between these packages and the physical system (diagram at left). The raw image provided by the webcam is a roughly-top-down view of the maze. This top-down view is then processed by the maze_solver to determine the current and desired position of the ball in meters. These values are then shared with the control_pkg via the /maze_info topic, and are used to determine the joint angles of the last two joints on the robotic arm (pictured in red at left). These joint angles induce an acceleration in the ball that guides it through the maze to the end target.

In the following sections, we will go into detail as to how the maze_solver and control_pkg packages work, in addition to describing message files and launch files used in our implementation.

Launch file - how to get things started

We built a launch file to run the packages mentioned above simultaneously and stored it in the final_proj/launch directory. The launch file initializes the usb_cam node, an image_view node that allows us to see the output of the usb_cam, a maze_solver node and a control_pkg node.

Maze solver package

The schematic above outlines the logic behind the processes in the maze_solver package, which subscribes to the raw image from the webcam and is responsible for both the sensing and planning in our system.

The system consists of two primary parallel processes: Position detection for the ball and endpoint, and maze reconstruction + pathfinding. We use decision logic and conversion methods to leverage these two processes to publish the current position and desired position of the ball in meters.

  1. Position detection using thresholding

The position detection component uses the cv2 library to do color thresholding to determine the position of the ball and endpoint in pixel space. Lower and upper bounds in HSV were determined by examining sample frames and then using online tools to look up possible lower and upper bounds. The thresholding process goes as follows:

  • Blur the image to reduce noise using cv2.medianBlur

  • Convert the blurred BGR image to HSV

  • Select the pixels that are within the predetermined lower and upper bounds of the HSV range for the desired color using cv2.inRange

  • Apply cv2.erode and cv2.dilate to once again smooth the mask by first eroding away the boundaries of the object and then increasing the area of the object

  • The result is a color-isolated image. See the image near the top left of the schematic for an example of what this looks like with our ball.

  • Convert the color-isolated image to grayscale and do binary thresholding on this new image

  • Find the contours in the grayscale image using cv2.findContours

  • Measure the radius of the contours by using cv2.minEnclosingCircle, and check to make sure the radius is greater than 5 px

  • If so, return the center of the circle derived above

Position detection is run every time an image is retrieved from the camera. From there, the system determines how to find the target position to send to the controller, either by solving the maze or by pulling from the saved path.

  1. Maze reconstruction and pathfinding

The maze reconstruction and pathfinding process is used to determine the path that the ball should take through the maze.

The maze reconstruction process takes place as follows:

  • The raw image from the camera is converted to grayscale, and binary thresholding is applied (bottom left picture in the schematic)

  • The threshold image is then converted to "grid cells" using a max-pooling process. The image is divided into a grid of 90x90, and then each grid cell is assigned the value of the maximum-valued pixel within it.

  • A buffer is then applied around the boundary of the barriers in the grid representation of the maze. For each barrier cell, make all of its neighbors that are free cells into buffer cells. Buffer cells are simply indicated with a different integer value. (bottom second-from-left picture in schematic)

At this point, we are ready to solve the maze! To do so, we need information from process 1 about the positions of the ball and endpoint. Steps are as follows.

  • Take the position of the ball end endpoint and map them to their respective grid coordinates

  • If the position of the ball or the endpoint is found to be in a barrier or buffer cell, gradually color that cell and the cells around it as "free space" with an increasing radius until a path can be formed. This accounts for rare edge cases where the ball is in a shadowy corner which is perceived by our system to be maze boundary, or close enough to maze boundary that the position of the ball becomes part of the buffer.

  • Run A* of BFS to find a path between the cells of the ball and the end point. (bottom right images in schematic)

  • Return the list of cells in the path and save it.

  1. Decision logic and returning the desired position

To combine the two processes above, we use the following logic to make sure we are publishing at a fast rate (20-50Hz):

  • Upon every new webcam image, run the position detection process.

  • If the initial image is received, or on a 5-second interval, run the maze reconstruction and pathfinding process and save a new path.

  • Next, find the position in the path that is closest to the position of the ball using the Euclidean distance between the ball position in pixel space and the center of the respective grid cell in the path, also in pixel space.

    • Find the grid cell that is min(index of closest cell to ball + 5, last cell), and return the center point of that cell in pixel space as the target position

  • Convert both the ball and target positions from pixel space to meters using a pre-measured px/m ratio

  • Publish the x and y coordinates of the ball position and the target position in meters to the maze_info topic via a maze_message

maze_message.msg

Our maze_solver node publishes maze_message messages to the /maze_info topic. A maze_message (seen at left) is a simple message that contains the x and y positions of the ball and the target position in meters. We also optionally are able to communicate a velocity figure for the ball. We originally used this feature of the message to publish velocity figures calculated in the maze_solver node, but later moved this calculation to the controller code where we found it to be more effective.

Control algorithm - 2D Ball and Beam Setup

We formulate our control problem into a 2D ball and beam problem. The closed-loop sampling was running at 11~12 Hz and was too slow to implement model-based controls, so we employed PID control - which turned out to be effective in this case. Details on our approach can be found below.


  • At startup, we set the sawyer joint positions to a pre-determined initial position where the maze is flat under the camera, and we are not close to a singularity when considering the 6th and 7th joints (the only joints we use to manipulate the maze in real-time)

  • The controller subscribes to the maze_info topic where it reads in the ball position and the desired position determined by the maze_solver node.

  • From here, we use PID control to calculate the desired acceleration of the ball in both the x and y directions (see equation 3 in the figure above)

    • To calculate the D term, we keep track of the n most recent ball positions and use a moving average to estimate the velocity.

  • We then convert this desired acceleration to desired joint angles using 2D-ball-and-beam dynamics (see equation 2 in the figure above)

  • We then use the Limb library to set the sawyer joint positions to the desired joint positions we calculated.

Control package schematic

Control - implementation tuning

To achieve smoother and more effective control, we add slight modifications to the PID algorithm.

  • Saturation for the desired angle in each direction is set to a small value

    • This maintains the small angle assumption to keep the vision ball and target position values accurate

    • It also limits the velocity of the ball so the robot is able to react in time

  • A small constant angle value is added to the desired angle to overcome the dead zone of the robot joints and keep the ball moving in case of vision error in a particular location

Hardware

Tripod

We used a Amazon Basics 50-inch Lightweight Camera Mount Tripod Stand to mount our webcam over the system. We used the first two segments of the telescoping arms and placed the tripod on a table.

Sawyer arm

We used a Sawyer robotic arm from Rethink Robotics to manipulate the maze. The arm has 7 joints. We used the last two joints to control the roll and pitch of the maze. We removed the gripper from the arm and screwed in our maze apparatus.

The maze

The maze itself was 3D printed out of standard PLA plastic. The walls of the maze were 1 inch tall, and the length and width of the maze were both 5 inches. The north wall of the maze included a reinforced section with several holes measured to align with the holes in the sawyer gripper. 1 inch screws with 1/2 inch spacers were used to mount the maze to the arm such that it was rigidly attached and the black boundary of the maze was fully visible to the webcam.

The paths in the maze were lined with laser-cut printer paper to create contrast with the walls and provide a smooth surface for the ball to travel on. Colored paper was used to mark end points in the maze.

The ball

We used colored 14mm diameter plastic marbles as the balls that we navigated through the maze. The robot was able to use a variety of ball colors simply by updating a variable in the position detection code.

Webcam

We used Logitch C270 USB webcam, which has 720p resolution and 30Hz of sampling frequency.

Full setup

At left find a picture of our full setup, with tripod, webcam, maze and robotic arm.