Once the arm was in the viewing position, we used computer vision in order to locate the cups on the table, before converting that image position into real-world coordinates.
This process was not without it’s simplifying assumptions:
We assumed the table was in a fixed location relative to the robot. While we would definitely consider recognizing the table in the image for a future step, this simplifying assumption provided us with two things.
Reliability: trusting our table position was one less variable to worry about, which helps reduce our error margins.
3D positioning: By assuming the physical table location, we were able to empirically measure the location of the table in space, rather than trying to convert the camera readings directly into a physical location. While again, this isn’t a strict requirement, we felt it was a straightforward simplification with reliability upsides.
We assumed the range of the robot was limited to ~1.3m-2.5m. This assumption allowed us to project the table’s into a rectangle, find the position of the top of the cup, and then add a reasonable offset to account for the distortion of the cup’s height. Not only is this assumption backed up by the physical realities of what our robot was able to do, but it also made our math significantly easier, while adding minimal error.
With these assumptions in our back pocket, let’s look at some code!
First we begin listening to the camera. Here, we set a global variable “captured” to False. Captured is our way of ensuring that we stop processing camera data once a cup is identified. We also initialize kps — another global variable which will hold the found cup positions — to empty. Finally, we subscribe to the right_hand_camera topic, which calls back to our function capture_img:
# GLOBAL VARIABLES
kps = [] # global list for detected cup locations
captured = False # global flag for whether image is captured
# -------------------------------------------------------------
sub = rospy.Subscriber("/io/internal_camera/right_hand_camera/image_rect", Image, capture_img) # subscribe to right hand camera
The capture_img function is where the meat of our computer vision happens. If no cups have been captured yet, it converts the image from the image message format ROS provides to a format that OpenCV works with. Then it calls our homography function to isolate the table and cups, tries to identify any cups, and sets our global variables if it finds any.
# callback function for subscriber - takes image message and updates global variables
def capture_img(img_message):
global kps
global captured
# if image already captured, return cup locations
if captured:
return kps
bridge = CvBridge()
img_cv = bridge.imgmsg_to_cv2(img_message) # convert image type
img_ho = homograph_img(img_cv) # perform homography on image
kops = find_circles(img_ho) # run circle detection to find cup locations
if len(kops) and not captured:
kps = kops
captured = True
Our homography works by taking the known corner locations of the table in our image — which are constant, given assumption (1) and the constant camera reading angle — and using the OpenCV findHomography function to convert that distorted table into a new, warped image, representing our table. In this image, we chose pixel values such that each pixel would represent 1/10th of an inch, allowing us to easily convert between identified circle locations and real life space.
# perform homography on given CV image
def homograph_img(img_cv):
pts_src = TABLE_IMG_CORNERS
pts_dst = np.array([[0.0, 0.0], [TABLE_WIDTH,0.0], [0.0,TABLE_LENGTH],[TABLE_WIDTH, TABLE_LENGTH]])
# framing the homography matrix
h, status = cv2.findHomography(pts_src, pts_dst)
# transforming the image bound in the rectangle to straighten
homo_resized = cv2.warpPerspective(img_cv, h, (int(TABLE_WIDTH),int(TABLE_LENGTH)))
return
Once we have our table image, we use OpenCV's Hough Circle detection to locate cups on the table. By integrating common-sense radius constraints (based on the real-life size of the cups) and fine-tuning the other Hough parameters, we are able to identify the location of the cup on the table, which we then return.
# find circle coords (modified from https://stackoverflow.com/questions/60637120/detect-circles-in-opencv)
def find_circles(img_cv):
minDist = 15
param1 = 50 #500
param2 = 30 #200 # smaller value-> more false circles
minRadius = 20
maxRadius = 40 #10
# find circles using Hough Circles
circles = cv2.HoughCircles(img_cv, cv2.HOUGH_GRADIENT, 1, minDist, param1=param1, param2=param2, minRadius=minRadius, maxRadius=maxRadius)
if circles is not None:
circles = np.uint16(np.around(circles))
for i in circles[0,:]:
cv2.circle(img_cv, (i[0], i[1]), i[2], (255, 255, 255), 2)
cv2.imshow('img_cv2', img_cv)
cv2.waitKey(0)
return circles[0,:,:2]
For the sake of consistency, we added a human-in-the-loop section where we confirm that the cup was actually detected. This was especially necessary because sometimes the topic just published an old image. If the detection was being performed on an old image or just didn’t work, we reset the captured and kps variables and reran the vision process. Otherwise, we unsubscribe from the camera topic and move to the next step.
while(input('redo? y/N: ') != 'N'):
kps = []
captured = False
while not captured:
rospy.sleep(1)
sub.unregister() # unsubscribe
From the detected circles, it is trivial to extract the position of the cup relative to the corner of the table. As each pixel represents a tenth of an inch, it's trivial to convert the pixel values to a position on the table. Because the circles detected are the rims of the cups and not the base, we add a small offset based on the heights of the cups and the viewing angle (adjusted with some empirical testing) to the y position on the table, giving us a more accurate measure. While this offset should change based on the y position, we found empirically that our robot's range was small enough where this wouldn't be an issue.
Because the path of the gripper is offset from the base of the arm, we had to do some math to convert the 2D-position of the cup on the table into the position in real space. This is not as simple as converting cartesian coordinates to polar coordinates since the torso pivot (j0) doesn't pass through the plane along which we are throwing. Thankfully with a bit of trigonometry, we are able to find the cup distance and angle!
# converts homo image coords (origin at NW corner) to a depth and angle to throw at
def get_cup_pos(x, y):
# convert deci-inches to ratios
x /= TABLE_WIDTH
y += CUP_HEIGHT
y /= TABLE_LENGTH
assert 0 <= x <= 1 and 0 <= y <= 1 # cup_x and cup_y are ratios of the table dim
# table dimensions in meters
table_width = TABLE_WIDTH/393.7
table_height = TABLE_LENGTH/393.7
# dist from robot to table start in meters
table_depth = TABLE_DEPTH/393.7
# get real world cup position
cup_y = table_depth + table_height * (1 - y)
cup_x = table_width * x
# calculate cup depth
cup_pivot_dist = ((cup_x - PIVOT_OFF)**2 + cup_y**2) ** 0.5
cup_depth = (cup_pivot_dist**2 - ARM_OFF**2) ** 0.5
# calculate cup angle (trig stuff)
cup_pivot_angle = math.atan(cup_x/cup_y)
cup_pivot_rotated_angle = math.atan(ARM_OFF/cup_depth)
cup_angle = cup_pivot_angle - cup_pivot_rotated_angle
return cup_depth + DEPTH_OFFSET, cup_angle