Design

Design Criteria and Desired Functionality

Accuracy: Identifying and classifying waste items to minimize sorting errors.
Flexibility: Ability to handle a wide range of waste types, including varying shapes, sizes, and materials.
Reliability: Consistent performance under different environmental conditions and usage scenarios.
Collision-Free Grasping: Approach items from above to minimize interference with neighboring objects.
Efficient Sorting: Quickly classify and move items to their proper bins.

Chosen Design

Perception and Classification:
- A RealSense camera captures images of the workspace.
- These images are sent to a remote GPU-enabled environment (Google Colab) running an “open-vocabulary object detection” model. The model returns both a classification (recyclable or non-recyclable) and a 2D bounding box around the detected object.
Coordinate Transformation and Localization:
- AR tags are placed in the scene to provide a reference for mapping 2D bounding box coordinates into the robot’s 3D workspace.
- Instead of relying on a full depth measurement, we use a fixed height approach. The bounding box center is projected into the robot’s base frame coordinates at a predefined height (Z-position). This simplifies depth estimation and avoids complex calibration requirements for precise depth sensing.
Motion Strategy:
- To safely approach the object, the robotic arm moves first above the item at a known safe height, ensuring it does not collide with other objects in the scene.
- After positioning above the target, the arm moves straight down to grasp the item.
Sorting Action:
- Once the item is grasped, the robot classifies it as recyclable or non-recyclable based on the model’s output.
- The robot then moves the item to the corresponding bin for proper sorting.

Design Choices and Trade-Offs

Fixed Height vs. Depth Sensing:

We initially considered using Dex-Net for advanced grasp planning, but due to camera angle and inconsistent depth information, we opted for a fixed height pick strategy. This reduced complexity and increased reliability but limited dynamic adaptation to objects of varying heights.

Cloud Processing vs. Local Processing:

Using Google Colab’s GPU environment allowed us to run complex models without local hardware constraints. However, this introduced communication latency. We chose this approach to leverage state-of-the-art object detection models while accepting minor delays in classification.

AR Tags for Localization:

AR tags provided a straightforward method to establish a consistent reference frame. This eased the coordinate transformation process at the cost of needing additional calibration and ensuring tags remain visible and fixed in place.

Impact on Real-World Criteria

Robustness: The fixed-height approach and AR tag-based localization simplify the system, reducing failure points.
Durability: Fewer complex calibration procedures mean less frequent adjustments, potentially improving long-term durability.
Efficiency: While waiting for remote classification slows down the process slightly, the simplified approach to grasping reduces mechanical and computational overhead, helping maintain consistent operation.

Page updated

Report abuse