Projects

Reinforcement learning for drones

We implemented a reinforcement learning–based framework for robust autonomous drone navigation under wind disturbances, building on the existing NavRL framework developed at CERLab. The system is designed to handle dynamic obstacle avoidance in realistic, cluttered environments where external disturbances such as wind can significantly degrade control performance. All training and evaluation were carried out in Isaac Sim, which provides high-fidelity physics, aerodynamic effects, and large-scale parallel simulation for efficient learning.

To improve temporal awareness and stability under disturbances, we extended the NavRL observation pipeline by stacking a history of drone states and processing this temporal information using an Transformer/LSTM-based policy architecture. Rather than relying solely on instantaneous observations, this enables the policy to infer unobserved dynamics such as wind forces, delayed system responses, and obstacle motion by learning patterns over time. This temporal learning is particularly important for dynamic obstacle avoidance, where safe decisions often depend on how the environment is evolving rather than a single snapshot.

The reinforcement learning policy was trained using thousands of parallel drone instances, allowing the agent to experience a wide range of wind conditions, obstacle configurations, and interaction scenarios. Through this process, the policy learned control strategies that are both reactive and robust, maintaining stable flight while avoiding collisions even in the presence of strong and unpredictable disturbances. Quantitative evaluation showed a significant improvement in performance metrics—including success rate, collision avoidance, and trajectory stability—compared to non-temporal baselines, especially in environments with moving obstacles. Overall, this framework demonstrates how integrating temporal learning into the NavRL pipeline substantially enhances the reliability and safety of autonomous drone navigation in challenging real-world conditions.

Metric after implementing Transformer learning for obstacle avoidance in Isaac Sim in windy conditions

Diffusion-NavRL:Diffusion Policy for Drone Navigation

We first train a reinforcement learning (RL) policy in high-fidelity simulation(Isaac-sim) to act as an expert navigator under wind disturbances and obstacle-dense environments. This expert generates trajectories from which we construct an offline dataset of state–action sequences. A diffusion policy is then trained on this dataset to model the distribution of expert motion over time, learning to denoise and reconstruct smooth future trajectories conditioned on the current observation. Unlike RL, which produces instantaneous and often noisy control commands, the diffusion policy generates temporally consistent waypoint sequences that naturally enforce smoothness. During deployment, the model predicts a short horizon of future motion and executes it in a receding-horizon manner, combining the robustness of RL with the smooth, predictive behavior of generative trajectory modeling.

ROS-deployment of Diffusion-NavRL

Robot learning for Manipulator(Course Project-16831)

This project implements Proximal Policy Optimization (PPO),Off policy and Model based policy for robosuite manipulation (e.g. Lift with a Panda robot): training uses Stable-Baselines3 and Gymnasium with observation and reward normalization, periodic evaluation, checkpointing, and optional Weights & Biases logging; evaluation loads a saved policy and VecNormalize stats and runs deterministic rollouts with configurable episodes and an on-screen or headless viewer. Overall, the Project provides a ready-to-use PPO training and evaluation stack for robosuite.

NMPC for wind disturbance rejection

Implemented a nonlinear Model Predictive Control (MPC) controller using the Acado solver in Gazebo for the existing UAV framework from CERLab , accounting for external wind forces with an external wind disturbance observer(extended Kalman Filter). It was able to follow the trajectory with much better accuracy than the current tracking controller of the CERLab framework. Modified PX4 firmware attitude controller by integrating customized proportional and integral gains to incorporate the NMPC.

Build custom drones for the Japanese sponsor

Build a custom drone using PX4 ardupilot and NVDIA Jetson orin and Livox Lidar and Realsense camera d435i for the Japan Sponsor.Deployed in a team of 3 and successfully completed experiment with the drone in Fukushima tunnel for obstacle avoidance.

Learnable Fourier Positional Encoding for Neural Radiant Field(Nerf)-Course Project(CMU-11-785)

We validated a baseline Neural Radiance Fields (NeRF) implementation by reproducing standard training and rendering results on benchmark datasets, ensuring consistency with reported performance metrics. Building on this validated baseline, we investigated the use of learnable Fourier feature encodings to address NeRF’s well-known spectral bias, which limits its ability to represent high-frequency scene details such as sharp edges, fine textures, and thin structures. Instead of using fixed positional encodings, the Fourier frequencies were treated as trainable parameters, allowing the model to adaptively allocate representational capacity to the most informative spatial frequencies in the scene.

This modification significantly improved the reconstruction of fine-grained visual details and accelerated convergence during training. Quantitative evaluation on the LLFF dataset, using the standard flower scene, demonstrated a +6 dB improvement in PSNR compared to the baseline NeRF model, indicating a substantial gain in image fidelity. Qualitative results further confirmed sharper boundaries, reduced blurring, and improved texture preservation. These results highlight the effectiveness of learnable Fourier encodings as a simple yet powerful enhancement to NeRF models for high-quality view synthesis.

Convex MPC for Quadraped-Course Project(CMU-24-785)

Designed and implemented a real-time convex MPC controller in ROS 2 for the Unitree Go2 quadruped, casting centroidal balance and motion tracking as a quadratic program that optimizes ground reaction forces over a prediction horizon. The formulation enforces friction-cone, unilateral contact, and minimum/maximum normal force constraints to avoid slipping and ensure physically valid contacts.Integrated Gurobi as the high-performance QP backend and Pinocchio for rigid-body dynamics and kinematics, using foot Jacobians to convert optimized contact forces into joint torques. Connected the full pipeline to the robot’s state-estimation and low-level control interfaces, enabling closed-loop torque control for stable stance.

Optimal Control Course Projects and Assignments(CMU-16745)

Experiment with Full Newton vs Gauss-Newton, then we use these methods to solve for the motor torques that make a quadruped balance on one leg.

Direct Collocation (DIRCOL) for a Cart Pole to balance it at inverted position

Use iLQR to solve a trajectory optimization for a 6DOF quadrotor

Use a direct method to optimize a walking trajectory for a simple biped model, using the hybrid dynamics formulation(Jump Maps)

Introduction To Deep Learning Assignments(CMU-11785)

Homework 1: Neural Networks Fundamentals

HW1P1: Neural Network Components from Scratch

Implement linear layers, activation functions, loss functions, and optimizers using NumPy
Build multi-layer perceptrons (MLPs) from scratch
Understand forward and backward propagation

HW1P2: Frame-Level Speech Recognition

Build neural networks for phoneme classification from MFCC features
Participate in Kaggle competition
Apply deep learning to speech processing

Homework 2: Convolutional Neural Networks

HW2P1: CNN Components Implementation

Implement 1D and 2D convolutional layers from scratch
Build pooling, resampling, and transposed convolution operations
Construct complete CNN architectures

HW2P2: Face Recognition with Metric Learning

Implement ResNet architecture from scratch
Apply ArcFace loss for face recognition
Build face verification system evaluated on EER metric

Homework 3: Recurrent Neural Networks

HW3P1: RNNs, GRUs, and CTC

Implement RNN and GRU cells from scratch
Build Connectionist Temporal Classification (CTC) loss and decoding
Create sequence classification models

HW3P2: Sequence-to-Sequence ASR

Build encoder-decoder RNN architecture for speech recognition
Apply CTC loss for end-to-end training
Implement beam search decoding

Homework 4: Transformers

HW4P1: Decoder-Only Transformer

Implement transformer components (attention, positional encoding, decoder layers)
Build GPT-style language model
Train on large-scale text data

HW4P2: Encoder-Decoder Transformer for ASR

Build full encoder-decoder transformer architecture
Apply to automatic speech recognition
Implement cross-attention, CTC auxiliary loss, and beam search

Modern control theory(CMU_24-787)

Engineered PID and full-state feedback controllers for an autonomous vehicle by linearizing its state-space model. Assessing controllability, applying optimal control with A* path planning, and integrating EKF SLAM for sensor independent navigation.

https://github.com/abiengc97/Modern-Control-Theory-CMU-24677.git

Machine learning and artifical intelligence(CMU_24-787)

Use neural network to predict the position of end effector( forward kinematics)

Elbow exoskeleton using passive mechanism

Led a 7-member team to design and fabricate an elbow exoskeleton featuring a gravity-balanced pulley mechanism. Focused on mechanical advantage and ergonomic assistance for upper-limb support.

Final report: https://drive.google.com/file/d/1sxnLKjUrD1GlQRoTOs6bCb2vJt31PXyl/view?usp=sharing

Page updated

Google Sites

Report abuse