Setup:
Blue marker traces a path (a lissajous curve) marked by blue crosses. The red marker is a tracker, which tries to track the trajectory of the blue marker while avoiding obstacles.
The motion model for both of them is a unicycle model.
Technical Approach :
The state of the bodies is given by the x, y co-ordinates of the position and the orientation θ. The control inputs are linear and angular velocities.
We define the error dynamics based on the motion model where
error = (expected state) - (actual state) + noise : we consider Gaussian Noise.
We define a quadratic loss function in terms of the error, find a sequence of control inputs which minimize the expected value of the loss function over a time horizon.
Ideally, we would want the time horizon to be infinite (which gives us a control policy - where you can query the optimal control input as a function of the state of the system). Another approach is to keep the time-horizon finite - which would mean we reduce the number of the optimization variables and quick optimizations mean that we can use this real-time. Note: it is crucial to know the "entire" reference trajectory in the former approach, whereas in the latter, we can do with knowing just for the next few time steps.
Finite-Horizon NLP - solved using CasADi. CasADi has several NLP solvers. We use IPOPT. Since we are performing numerical optimizations, our state and control spaces can be continuous with relevant constraints.
We perform Generalized Policy Iteration to solve this infinite-horizon problem. To implement this, we need a discrete state-space, motion model with Gaussian Noise and hence we discretize the (x,y,θ) variables with fixed interval steps.
We discretize the motion model by using the "nearest neighbors" to the state obtained out of the discrete states.
Setup -
Known goal locations, start locations, unknown key / door locations
5x5, 6x6, 7x7, 8x8 environments : discrete state-space
Find the shortest path to goal while performing the following tasks:
- pick the key (only if necessary)
- open the door (if closed)
- reach the goal
- avoid obstacles
Technical Approach -
Problem formulated as a Markov Decision Process with the following
state-space : x = [i, j, d, key_status, door _ status] where i, j are the co-ordinates in x and y of the marker, d is the heading direction which can be (L, R, U, D) for (left, right, up, down) respectively, key_status is 0 or 1 depending on whether we have the key or not and door_status is 0 or 1 depending on whether a particular door is open or close.
control-space : U = {MF, T L, T R, PK, UD} where (MF = move forward, TL = turn left, TR = turn right, PK = pick key, UD = unlock door)
stage and terminal costs : stage-cost = 1 to move/turn or basically implement any of the control actions when we are NOT reaching the goal in the next step.
stage-cost = 0 to move/turn or implement a control action that takes us to the goal
terminal-cost = 0 if we are at the goal else ∞
Cost function for Dynamic Programming : V = stage cost + terminal cost
Solved using the Dynamic Programming Algorithm
Detailed Algorithm explained in the report below.
Setup:
Stereo cameras installed on the car capture images at fixed intervals and certain landmarks are already identified. The location of these landmarks in the left and the right camera is already provided to us as a part of the image data. We also have the information on which landmarks are obtained in which camera frame over time.
IMU on the car provides - angular and linear accelerations. Vehicle Motion model - given by Pose Dynamics in SE(3).
We wish to get the best estimate of the path taken by the vehicle and the location of the landmarks using an Extended Kalman Filter.
Technical Approach:
Landmark Prior and raw vehicle trajectory : We first obtain the raw trajectory (position, orientation) by integrating IMU data over time. From the image data, knowing the camera calibration matrix, we obtain a "prior" estimate of landmark locations in the camera frame, and then in the world frame by transformation (since the vehicle pose is known).
Assuming a noisy observation model, we calculate the "expected measurement" of pixel-co-ordinates for each landmark at each time instant. Using the error between the actual measurement and the expected measurement and an Extended Kalman Filter (Kalman gain calculated using the derivative of the non-linear camera projection function) we update our estimate of the landmark positions in the world. We also calculate the uncertainty in the landmark positions.
Using these updated landmark positions we calculate the "updated" pose of the vehicle for that instant (by essentially using the exponential of the hat map of the innovation term, since vehicle pose is in SE(3) - check project report for more info).
Finally, using the Kalman Gain, we update the uncertainty in the vehicle pose and the landmark locations. Using this new uncertainty we move to the next time step and repeat the above two steps. The Kalman gain for the next step basically uses this updated uncertainty.
3D spaces -
Setup : We are given several 3D spaces with formed using Axis-aligned Bounding boxes. A start and a goal position is given and the goal is to find a path from the start to the goal position.
Technical Approach:
We write a function based on geometry to determine whether a segment collides with a given axis-aligned bounding box.
Search Based : we use a weighted A-star search where the weight is added along with the heuristic. A high weight aggressively moves us towards goal (meaning the path can be sub-optimal) whereas a low weight explores more (meaning more time to find a path but more likely an optimal one with respect to the discretization performed)
Sampling Based : we use RRT-star and RRT-connect algorithms. While RRTs explore spaces rapidly, they do take time in case the path to the goal goes through narrow gaps like in mazes. RRT-connect is relatively faster since it contains trees exploring from both the goal-side and the start-side. Jagged paths obtained are pruned.
2D Mazes -
Setup : We are given 2D mazes, where we implement search-based algorithms going from a start to a goal location
Technical Approach :
We implement BFS, DFS and A-star algorithms, with multiple heuristics in A-star.
We observe how BFS and A-star provide optimal paths while DFS doesn't.
Setup :
We first have to set up an Iterative Point-Cloud Matching framework. The goal is generate an occupancy map with ICP on LiDAR data obtained from a mobile robot, along with a texture map of the ground using images obtained from a camera.
Finally, we have to perform Factor-graph SLAM to obtain the robot trajectory.
Technical Approach :
Iterative Point-Cloud matching - We perform scan-matching on two sets of point-clouds using the ICP algorithm invoking the Kabsch Algorithm and determine the avg. relative orientation between two sets of point-clouds. This sets up a function for data-association given two sets of point clouds.
LiDAR-based SLAM -
We used ICP on Lidar data (LiDAR mounted on bot) of consecutive time-steps to determine the relative pose between the current and previous poses of the bot and generated a trajectory estimate using the sequence of relative poses obtained.
Implemented Factor-Graph SLAM using the sequence of relative poses and fixed-interval loop-closure using the GTSAM library to optimize the trajectory estimate - compared this with the raw estimate and Odometry based-estimate.
Occupancy and Texture Mapping -
We generate a 2D occupancy map from the LiDAR scan data by first transforming points from the LiDAR frame to the world frame(since the orientation of the bot is now known) and then using the Bresenham2D algorithm. We assign a probability to every point on the map of being occupied, free or unexplored and map it on a color-bar as shown below. By overlaying images obtained from a ground-facing camera on this occupancy map, we create a texture map.
Setup:
We are provided with time-stamped IMU data, camera images (RGB) from a mobile robot along with VICON data for the orientation of the body of the robot. We wish to calibrate the noisy IMU measurements and estimate the orientations and accelerations of the body and compare them with the ground truth (VICON data).
Based on the obtained orientations, we wish to stitch a panorama from images captured from the camera in these orientations.
Technical Approach:
We calibrate the IMU data based on the accelerometer and gyro sensitivities from the IMU datasheet and eliminate initial unstable readings.
Based on the quaternion-kinematics motion model we obtain the "optimal" sequence of quaternions (orientations) that minimizes the error in the observed data while also obeying the motion model. This is done using Projected Gradient Descent. Check detailed optimization cost function (quadratic loss) in the report.
With the corrected orientations, we transform the images from the camera frame to the world frame and project them on a 2D plane to stitch a panorama.
Setup:
On the Qualcomm RB5 robots (differential drive) we aim to setup the entire Perception-Localization-Mapping-Planning-Control pipeline to navigate an arena surrounded with AprilTags and containing obstacles. We aim to implement a Kalman Filter for localization and designing a plan to move the bot around the arena such that we achieve > 80% coverage.
Technical Approach:
Calibration and Control - Control commands provided are motor velocities for each of the 4 wheels. We first create a calibration matrix for the motor that scales the input values such that all motors have the correct desired value of angular velocity for rotation. This needs to be done since there are non-uniformities in the commands provided and the output generated for each motor. Moving from one pose to another - done using a PID control, where feedback on the robot pose is provided using AprilTags.
Perception - We calibrate a monocular camera mounted on the robot and perform April-Tag detection using OpenCV. This provides us the pose of the AprilTag in the frame of the camera.
Localization and Mapping - We place AprilTags at fixed locations around the arena, and hence we determine the robot pose by transforming the above camera-frame measurement first into the AprilTag frame and then into the world frame. Of course the measurements are noisy and we implement a Kalman Filter to perform localization and mapping. We make the robot move around in various shapes in the arena (like a square, triangle, hexagon etc.) multiple times to improve our mapping estimates. We observe that robot motion is ~5-10cm off the desired motion given the noisy measurements.
Motion Planning - Waypoints are followed using a PID control, the feedback for which is received using the above SLAM strategy. Hence in planning, we generate a set of waypoints to be followed. Using an A-star algorithm with Euclidean and Manhattan heuristics, we generate shortest and safest paths to the goal positions and discretize them into waypoints for the robot to follow.
Arena Coverage - Using the above pipeline, we provide the robot with a path that starts from the boundary of the arena and spirals inwards to the center of the square arena, with a pitch small enough to ensure that most of the area is covered.
Refer to the reports below for more details on the setup and implementation which is done using Python and ROS.
Algorithm codes are stored and implemented on the robot microcontroller and can be edited using a remote connection (ssh).
Localization based on AprilTag detection
A-star based shortest path (with a safety margin around obstacle)
Space coverage using vision-based mapping and localization : >80% coverage
Setup:
We aim to perform surface normal detection using RGB images. We also identify 3D canonical co-ordinate frames for the image and their 2D projections.
This project is adopted from the work - Huang, J., Zhou, Y., Funkhouser, T., Guibas, L.J.: Framenet: Learning local canonical frames of 3d surfaces from a single rgb image.
Technical Approach:
We perform Joint-Estimation of the surface-normal(n), the 3D tangent principal directions x0, y0 and their 2D projections xi, yj. The surface normal is a 3 dimensional vector, so are the tangent principal directions xo, yo and their 2D projections xi, yj are 2 dimensional each making it a 13 dimensional vector that we can learn on.
The loss function is a combination of several energies (L, P, N, C, O) each of which is an L2 loss. L = L2 loss between the 2D projections of principal directions and the ground truth. P = L2 loss between the 3D tangent principal directions and the ground truth. N = L2 loss between the surface normal and its ground truth. C = L2 loss between the estimated projection(calculated from xo, yo) and the actual projection (xi, yj) of each tangent principal direction. O = L2 loss of the error in surface normal and the cross-product of the tangent directions (since the cross product of the tangent vectors should be along the normal). Total loss is simply the sum of all of these losses.
These 13 parameters are learnt by implementing a DORN architecture which has a ResNet 101 backbone. We use the ScanNet dataset for training and validation and the NYUv2 dataset for testing. With limited computational resources, we restrict to using just 10% of the dataset.
Surface normal detection - comparison between the original and trained models (DORN - trained on full dataset and 10% dataset - ScanNet)
Identifying 3D canonical directions - principal axes from the image
Setup:
We have an AUV at some underwater location and we wish to guide to the "home" position which is a dock with co-ordinates (0,0,0) using a control algorithm.
The AUV is actuated by 8 thrusters that control torques along roll, pitch, yaw and the surge velocity.
We then use the same control algorithm to make the AUV follow a set of waypoints underwater.
Using a SONAR attached on the AUV, we perform seabed mapping while the AUV follows some waypoints.
Technical Approach :
Implemented a sliding-surface control to guide an AUV from any underwater location to the origin (0,0,0) on the surface.
We assume a constant surge velocity v and control the heading of the AUV using the yaw and the pitch. The idea is to adjust the heading (pitch and yaw) until we are pointing "straight" to the origin. Once we are pointing towards the origin, we can purely control the surge velocity to make us move along the straight line towards the origin. We assume the "direction" to the origin is known.
Sliding mode control : If the current yaw, pitch are y, p and at the current instant, the yaw and pitch which point exactly towards the origin are y0, p0, then we provide an angular velocity along the yaw and pitch using :
w_y = Ksgn(y0 - y) ; w_p = Msgn(p0 - p) | K, M are constants to be tuned | we get feedback on sgn(y0-y) and sgn(p0-p) | sgn(x) = sign of x
It's important to note that the control is applied while having a 'constant' surge velocity, which means our AUV "slides" until we align the heading.
While the control-law gives us the angular velocity to be applied, we convert it to the desired thruster torque based on the dynamics of the vehicle. For the purpose of the project we operate in the conditions where the torque required is proportional to angular and linear velocities.
We use the above algorithm to perform waypoint-following underwater, where we also need additional downward thrust to counter the upward force of water.
We then perform SONAR-based mapping of the seabed topography while navigating underwater simply using point-cloud data and transforming it into the world-frame considering we know the state (position, orientation) of the AUV.
Setup :
We aim to use Fully Convolutional Neural Networks for Semantic Segmentation.
Technical Approach :
We use the VGG-16 backbone. FCN-32 is a small variation of VGG-16 with 1x1 convolutions in place of fully-connected layers and a Transpose convolution at the end to give a dense prediction output. There are NO skip connections.
We then use the FCN-8 architecture we pretrained weights and compare the performance with FCN-32. Clearly FCN-8 with skip connections, gives a better output given the unsampling factor of 8 which helps capture finer features. Skip connections help combine low resolution and high resolution outputs to give a better segmentation result.
Setup :
We have a solid block placed on the ground (X-Y plane) and the Franka-Emika Panda 7-DOF robot arm with a camera mounted on the end-effector.
The goal is to position the end-effector such that the solid block lies at the center of the image while the block is stationary and while it is moving. Further, we also deploy a null-space controller to minimize the roll of the camera.
Technical Approach :
We obtain the pixel co-ordinates of the block from the image using OpenGL, based on which we calculate the Image Jacobian - the relation between the object velocity in pixel-space with the linear and angular velocities of the end-effector (camera). The "desired velocity" in pixel-space is calculated based on the desired and current position in the pixel-space.
The velocity is pixel-space is translated to the world-frame through the Image Jacobian, which is further transformed into the joint-space using the joints Jacobian for the robot arm.
We apply a proportional controller for the joints to move the robot arm to the desired configuration.
For Null-space control, we find the camera roll (r) as a function of the joint angles (q) based on the D-H parameters obtained from the Franka-Emika-Panda docs here - https://frankaemika.github.io/docs/control_parameters.html#denavithartenberg-parameters. We encode this in sympy to calculate the gradient of the roll with respect to the joint angles.
The null-space control term is finally calculated using q_null = (I - J'J)Dq where q_null is the null-space control command and Dq is the gradient.
Setup :
We have multiple solid objects moving on specific trajectories in 3D space.
Using a camera mounted on the end-effector of a 7-DOF robotic arm, we wish to center the track (maintain at the image center) the object closest to the camera
We design a Model-Predictive Controller to perform tracking and compare the performance with a baseline PID controller.
Technical Approach :
We first identify the pixel-coordinates of the nearest object using the camera position, orientation, object location and the camera calibration matrix by simply taking a projection in 2D in the frame of the camera.
We design our control in the pixel-coordinate space where the state is the location of the object X = [u, v] in the image. The control space is the joint velocities of the robot U = [q1, q2, q3, q4, q5, q6, q7].
Motion model - Given the joint velocities at the current time step, we "predict" what the state would be at the next time step. Using an Euler-discretized model, we first calculate the joint angles for the next time step. Based on the robot D-H parameters we then find the end-effector pose which then helps us to take a projection in the camera frame to predict the pixel coordinates.
Based on this motion model we formulate a Nonlinear Optimization problem in CasADi with a quadratic cost function considering the error in the state, the magnitude of the control input and a penalty for moving joint angles to extremities.
The above problem is solved using an IPOPT solver with a prediction horizon of N = 10.