There are so many subfields in computer vision, such as segmentation, object detection and recognition, egocentric vision, etc.
Our focus in this project is traditional stereo, which is the fundamental problem of visual servoing.
Someone may argue that this is a solved problem already, yet I think there are still many interesting researches related to this.
The platform is AR Drone, which is quadrotor with several builtin sensors.
Our job is to utilize its frontal camera to find a certain color pattern, and fly towards it.
Since AR Drone is lifted and propelled by 4 rotors, it is more stable and easy to control.
Yet battery is the Achilles' heel of all electronic equipments, including this one.
Finding the camera pose is the fundamental problem of visual servoing.
In order to find the relationship between coordinates in the world and coordinates in the images, solving geometric camera calibration is the first step.
Two transformations we need to solve:
1. Extrinsic parameters (camera pose)
Assuming we know the world coordinate, there are 6 DOF for rigid body transformation.
,where is the point in the camera frame, with is the rotation from world to camera frame, is the point in the world frame, and is the translation from world to camera frame.
2. Intrinsic parameters
Now from camera 3D to image, we add skew and aspect ratio,
intrinsics = ,
where f is focal length, (x'c, y'c) is the principle point, a is aspect ratio.
Eventually, camera matrix (or M)
, intrinsics * projection * rotation * translation, with total 11 DOF. Hence, given a is the world frame, we can determine its corresponding point p in the image frame. When there is movement involved, there is Jacobian matrix.
In this case, jacobian decribes how a point in a 2D image (camera frame) responds to the movement in 3D world frame (camera motion). where J(x, y, Z) depends on the point in the image (x, y), and the depth Z from camera to the point in the world frame. Then, given a twist ξ(w, v), we can find how a point p_start(x_star, y_star) moves. In this assignment, we used a color pattern instead, see below. Use the color detector, we found its image location, In the end, we applied inverse Jacobian matrix to approximate its location in the 3D world by updating its current position in the image and current twist. Meanwhile, proportional control was also adopted to adjust action. 1. Proportional control is crucial; it can prevent the drone from extreme maneuvers.
When Kp (proportional gain) increases, the camera pose changes drastically. (1) Kp = 0.01, angular velocity, 2. Depth Z (distance) is hard to approximate, yet using disparity map Z = f*B / d, where f is focal length, B is baseline distance, and d is disparity, is a good choice.

Course Portfolio > CS3630 Intro to Robotics >