Computer Vision

Once the arm was in the viewing position, we used computer vision in order to locate the cups on the table, before converting that image position into real-world coordinates.

This process was not without it’s simplifying assumptions:

We assumed the table was in a fixed location relative to the robot. While we would definitely consider recognizing the table in the image for a future step, this simplifying assumption provided us with two things.
1. Reliability: trusting our table position was one less variable to worry about, which helps reduce our error margins.
2. 3D positioning: By assuming the physical table location, we were able to empirically measure the location of the table in space, rather than trying to convert the camera readings directly into a physical location. While again, this isn’t a strict requirement, we felt it was a straightforward simplification with reliability upsides.
We assumed the range of the robot was limited to ~1.3m-2.5m. This assumption allowed us to project the table’s into a rectangle, find the position of the top of the cup, and then add a reasonable offset to account for the distortion of the cup’s height. Not only is this assumption backed up by the physical realities of what our robot was able to do, but it also made our math significantly easier, while adding minimal error.

With these assumptions in our back pocket, let’s look at some code!

Listening to the Camera

First we begin listening to the camera. Here, we set a global variable “captured” to False. Captured is our way of ensuring that we stop processing camera data once a cup is identified. We also initialize kps — another global variable which will hold the found cup positions — to empty. Finally, we subscribe to the right_hand_camera topic, which calls back to our function capture_img:

# GLOBAL VARIABLES

kps = [] # global list for detected cup locations

captured = False # global flag for whether image is captured

# -------------------------------------------------------------

sub = rospy.Subscriber("/io/internal_camera/right_hand_camera/image_rect", Image, capture_img) # subscribe to right hand camera

Capturing the Image

The capture_img function is where the meat of our computer vision happens. If no cups have been captured yet, it converts the image from the image message format ROS provides to a format that OpenCV works with. Then it calls our homography function to isolate the table and cups, tries to identify any cups, and sets our global variables if it finds any.

# callback function for subscriber - takes image message and updates global variables

def capture_img(img_message):

global kps

global captured

# if image already captured, return cup locations

if captured:

return kps

bridge = CvBridge()

img_cv = bridge.imgmsg_to_cv2(img_message) # convert image type

img_ho = homograph_img(img_cv) # perform homography on image

kops = find_circles(img_ho) # run circle detection to find cup locations

if len(kops) and not captured:

kps = kops

captured = True

Homography

Our homography works by taking the known corner locations of the table in our image — which are constant, given assumption (1) and the constant camera reading angle — and using the OpenCV findHomography function to convert that distorted table into a new, warped image, representing our table. In this image, we chose pixel values such that each pixel would represent 1/10th of an inch, allowing us to easily convert between identified circle locations and real life space.

# perform homography on given CV image

def homograph_img(img_cv):

pts_src = TABLE_IMG_CORNERS

pts_dst = np.array([[0.0, 0.0], [TABLE_WIDTH,0.0], [0.0,TABLE_LENGTH],[TABLE_WIDTH, TABLE_LENGTH]])

# framing the homography matrix

h, status = cv2.findHomography(pts_src, pts_dst)

# transforming the image bound in the rectangle to straighten

homo_resized = cv2.warpPerspective(img_cv, h, (int(TABLE_WIDTH),int(TABLE_LENGTH)))

return

Circle Detection

Once we have our table image, we use OpenCV's Hough Circle detection to locate cups on the table. By integrating common-sense radius constraints (based on the real-life size of the cups) and fine-tuning the other Hough parameters, we are able to identify the location of the cup on the table, which we then return.

# find circle coords (modified from https://stackoverflow.com/questions/60637120/detect-circles-in-opencv)

def find_circles(img_cv):

minDist = 15

param1 = 50 #500

param2 = 30 #200 # smaller value-> more false circles

minRadius = 20

maxRadius = 40 #10

# find circles using Hough Circles

circles = cv2.HoughCircles(img_cv, cv2.HOUGH_GRADIENT, 1, minDist, param1=param1, param2=param2, minRadius=minRadius, maxRadius=maxRadius)

if circles is not None:

circles = np.uint16(np.around(circles))

for i in circles[0,:]:

cv2.circle(img_cv, (i[0], i[1]), i[2], (255, 255, 255), 2)

cv2.imshow('img_cv2', img_cv)

cv2.waitKey(0)

return circles[0,:,:2]

Human-In-The-Loop

For the sake of consistency, we added a human-in-the-loop section where we confirm that the cup was actually detected. This was especially necessary because sometimes the topic just published an old image. If the detection was being performed on an old image or just didn’t work, we reset the captured and kps variables and reran the vision process. Otherwise, we unsubscribe from the camera topic and move to the next step.

while(input('redo? y/N: ') != 'N'):

kps = []

captured = False

while not captured:

rospy.sleep(1)

sub.unregister() # unsubscribe

Real Life Position

From the detected circles, it is trivial to extract the position of the cup relative to the corner of the table. As each pixel represents a tenth of an inch, it's trivial to convert the pixel values to a position on the table. Because the circles detected are the rims of the cups and not the base, we add a small offset based on the heights of the cups and the viewing angle (adjusted with some empirical testing) to the y position on the table, giving us a more accurate measure. While this offset should change based on the y position, we found empirically that our robot's range was small enough where this wouldn't be an issue.

Because the path of the gripper is offset from the base of the arm, we had to do some math to convert the 2D-position of the cup on the table into the position in real space. This is not as simple as converting cartesian coordinates to polar coordinates since the torso pivot (j0) doesn't pass through the plane along which we are throwing. Thankfully with a bit of trigonometry, we are able to find the cup distance and angle!

# converts homo image coords (origin at NW corner) to a depth and angle to throw at

def get_cup_pos(x, y):

# convert deci-inches to ratios

x /= TABLE_WIDTH

y += CUP_HEIGHT

y /= TABLE_LENGTH

assert 0 <= x <= 1 and 0 <= y <= 1 # cup_x and cup_y are ratios of the table dim

# table dimensions in meters

table_width = TABLE_WIDTH/393.7

table_height = TABLE_LENGTH/393.7

# dist from robot to table start in meters

table_depth = TABLE_DEPTH/393.7

# get real world cup position

cup_y = table_depth + table_height * (1 - y)

cup_x = table_width * x

# calculate cup depth

cup_pivot_dist = ((cup_x - PIVOT_OFF)**2 + cup_y**2) ** 0.5

cup_depth = (cup_pivot_dist**2 - ARM_OFF**2) ** 0.5

# calculate cup angle (trig stuff)

cup_pivot_angle = math.atan(cup_x/cup_y)

cup_pivot_rotated_angle = math.atan(ARM_OFF/cup_depth)

cup_angle = cup_pivot_angle - cup_pivot_rotated_angle

return cup_depth + DEPTH_OFFSET, cup_angle

Page updated

Report abuse