Improve robustness to lighting and background variation by replacing color based segmentation with learning-based methods such as YOLO or SAM for more reliable object detection in cluttered, noisy scenes.
Enable multi-object detection and tracking to support more complex tasks involving multiple candidate objects and dynamic environments.
Improve camera calibration and coordinate transformation to increase the accuracy of the 3D point estimation for manipulation of the gripper.
More sophisticated methods of exploration would be implemented to increase the efficiency of searching.
One such example could be the A* algorithm; the robot could use the camera to build a map of the environment it has encountered and use this information to quickly identify best routes to targets.
Another improvement would be the full utilization of the robots ability to move translationally in any direction. At the present moment the robot is programmed to only spin and move forward/backward. Motion which is more fluid and can both rotate and translate simultaneously would be beneficial for efficient movement and preparing for object grasping.
Vision-Driven Affordance & Grasp Generation
Transitioning from fixed Cartesian coordinates to dynamic, vision-based planning. By integrating visual affordance models, the robot will autonomously analyze object geometry and scene context in real-time. This allows the system to inherently understand where and how to interact with objects in unstructured environments, entirely removing the need for predefined waypoints.
Generative Grasp Synthesis (GAN / Diffusion)
Leveraging advanced generative AI like GANs to synthesize diverse, highly robust 6-DoF grasp poses for novel objects. This data-driven approach bypasses the limitations and local minima traps of traditional analytical IK solvers, enabling the robot to reliably predict and execute successful manipulation strategies even for completely unseen shapes.