System Design

Our system comprises 3 main subsystems, namely the User Interface, Navigation and the Grasping subsystem. The subsystems were designed to address high-level functionalities of the robot.

User Interface

The User Interface subsystem is responsible for all the interactions between the system and the user. We designed the subsystem with a primary objective of providing a simple, natural and an intuitive interaction experience for the user. The components of this subsystem are depicted in the figure below.

The following series of steps illustrates the main flow of processes in the subsystem:

        1.     Gathering user input

        2.     Processing the speech data

        3.     Recognizing the speech and extracting factual details and commands

        4.     If anything was wrong, report to user immediately and skip the following steps

        5.     If all the previous steps were successful, Pass the details to navigation sub-system

The user’s speech command is acquired using a multi-array microphone. The speech data are first conditioned to suppress ambient noise and then decoded using pre-trained acoustic models. The decoded speech is converted into plain text from which the factual information and user’s commands are extracted using a combination of natural language processing and text processing.

Consider the following as an example to illustrate the working of the sub-system:

The user could say: “BUD-E bring my pill box from the bed room.”

This subsystem acquires the audio data, recognizes the speech, converts it into text, and then identifies and extracts contextually useful information like: “bring”, “pill box” and “bed room”. It then passes on the task details (bring), object details (pill box) and the destination location (bed room) to the navigation sub-system for further processing.

This subsystem is also responsible for communicating the status of the system to the user. We feel the interaction between the system and user to be an important aspect of our system as the Human Robot Interaction element is one of the key elements in a consumer robot. For example, the system will communicate to the user if it was unable to hear the user properly or if it had difficulties in formulating its task objectives from the speech input. It will also report to the user about any difficulties during its course of plan execution. Apart from reporting errors and difficulties, it’ll also report to the user about the status of the task completion. This way, the user gets the necessary feedback and status updates.


The navigation subsystem is responsible for the system’s high-level motion planning. It coordinates with the other subsystems in order to achieve the overall system objective. Once the task details are obtained from the user using the user interface subsystem, the system starts to plan its path to reach the destination. The flow of information inside this subsystem is described in figure below.

Navigation stack.

Main components of the navigation stack:

1.     Mapping and Localization

2.     Path Planning

            - Global path planning

            - Local path planning

3.    Obstacle Avoidance           

-Static obstacle avoidance

            -Dynamic obstacle avoidance

Mapping and Localization

     The Hector SLAM approach we have implemented can be used without odometry as well as on platforms that exhibit roll/pitch motion (of the sensor, the platform, or both). It leverages the high update rate of LIDAR systems like the Hokuyo and provides 2D pose estimates at the scan rate of the sensors .

The node uses tf for transformation of scan data, so the LIDAR does not have to be fixed related to the specified base frame. 


hector_slam block diagram.

Generated map along with the path traversed (SLAM)

For 3D SLAM, we implemented the ccny_slam technique. It is a tool for fast visual odometry and mapping from RGBD data. Using the keyframe_mapper node in the CCNY_RGBD tools we can use this application for graph-based offline SLAM. It also serves as the online map server, which can save, load and visualize point cloud maps.

Unlike other systems (e.g RGBDSLAM), this system for visual odometry does not use dense data, feature descriptor vectors, RANSAC alignment, or keyframe-based bundle adjustment. By avoiding these computations, an average performance rate of 60Hz can be achieved.

3D map of a room reconstructed by a handheld RGB-D camera.

The implementation of this technique was faster and more robust than techniques like RGBD-SLAM. 

Path planning

-> Costmaps

     We used two costmaps to store information about obstacles in the world. One costmap is used for global planning, meaning creating long-term plans over the entire environment, and the other is local costmap, used for local planning and dynamic obstacle avoidance. In particular, the package costmap_2D provides an implementation of a 2D costmap that takes in sensor data from the world, builds a 2D occupancy grid of the data and inflates costs in a 2D costmap based on a user specified inflation radius.

 -> Global Path planner

The global path planner generates a high level plan for the navigation stack to follow. The global planner is fed with the obstacle and cost information contained in the global costmap, information from the robot’s localization system, and a goal in the world (given from the Rviz window). From this information the global planner creates a high-level plan for the robot to follow to reach the goal location. The global planner will create a series of waypoints for the local planner to achieve. For planning the global trajectory from the initial position to the goal position we implemented Djiktra’s algorithm.

Screenshot demonstrating the robot's mapping, localization, and global path planning in Rviz.

-> Local Path planner and Dynamic Obstacle Avoidance

The local planner is responsible for generating velocity commands for the mobile base that will safely move the robot towards a goal. The local planner is seeded with the plan produced by the global planner, and attempts to follow it as closely as possible while taking into account the kinematics and dynamics of the robot as well as the obstacle information stored in the local costmap. In addition to the global costmap, we also used a local costmap because it adheres to the actual obstacles rather than the situation that existed when the global (static) map was created. Due to this implementation, BUD-E is able to perform dynamic obstacle avoidance.



The grasping subsystem takes care of all the object manipulation tasks of the system. The main components of the grasping subsystem are depicted in figure below. Object recognition algorithms are employed to identify BUD-E’s specially designed basket and a handful of typically used household objects like a water bottle, soda cans and beer cans.

Grasping subsystem

It consists of a 5 DoF Arm for manipulating objects.  Once the robot reaches the goal location, the grasping subsystem is responsible for detecting objects in the scene and recognizing them. The recognition pipeline identifies the object of interest and localizes the object with respect to the system. The end effector position is obtained from these data. The subsystem then solves the inverse kinematics and estimates the best possible joint positions to position the end-effector at the desired position and also at the desired orientation. Once the system identifies the best trajectory with no collisions, it sends appropriate position and velocity commands to the joint actuators, which then move the manipulator to the desired location in space at the desired orientation. After successfully grasping the desired object, the system returns back to the user, planning its way back to where the user was, using the map that was generated on its way to the destination.