Video instance segmentation models like VITA excel at identifying objects within individual frames but often struggle to maintain consistent object identities across time, leading to ID switches where the same object receives different labels in subsequent frames. This project addresses this limitation by integrating a temporal association module inspired by TCOVIS methods into VITA's architecture. The module learns to associate object tokens across frames through learned embeddings, a memory bank with exponential moving average updates, and Hungarian algorithm-based optimal matching. This project was done as a class project for the CMSC848k course at the University Of Maryland, College Park.
Github: https://github.com/Kaustubh484/TCOVITA
Our implementation features a lightweight 2-layer MLP embedding head that generates comparable instance representations over time, enabling robust temporal consistency without extensive architectural changes. The temporal association module operates as a post-processing tracker that works with pre-computed segmentations, making it practical for individual research efforts while maintaining research validity. This approach focuses on the core challenge of temporal object association rather than full model retraining.
Qualitative results of TCOVITA compared to baseline on YouTube-VOS 2019, demonstrating superior temporal consistency in settings with fast movement and occlusion. Check out the full project report: https://drive.google.com/file/d/1DZahKQqKB0zEfvRJVgFYd0SLVmWfZgZA/view?usp=sharing
This project develops an end-to-end autonomous navigation system for quadcopters using deep reinforcement learning in PyBullet simulation. I implemented and compared three state-of-the-art algorithms (Soft Actor-Critic, Deep Deterministic Policy Gradient, and Proximal Policy Optimization) for waypoint-following tasks in 3D environments. The SAC model achieved 94.2% waypoint completion with smooth, efficient trajectories, outperforming traditional PID controllers that require extensive manual tuning. The system features a modular architecture supporting multiple sensor modalities (LiDAR, depth, RGB cameras) and curriculum learning that reduced training time by 67%.
The project addresses critical challenges in DRL-based robotics including reward shaping to prevent early termination behaviors, action smoothing for stable flight dynamics, and progressive task difficulty scaling. Key technical contributions include a physics-based Crazyflie 2.X simulation with 240Hz dynamics, an 18-dimensional state representation using body-frame coordinates, and comprehensive reward engineering that balances exploration and goal-reaching. The system runs headless at 4-5× real-time speed, making it practical for large-scale experimentation. This work demonstrates the viability of model-free reinforcement learning for continuous control in robotics and establishes a foundation for future research in obstacle avoidance and sim-to-real transfer.
Github: https://github.com/Kaustubh484/pybullet-drone-navigation
This project presents a comprehensive performance comparison of two leading Large Language Model serving frameworks, vLLM and SGLang, evaluating their efficiency in production-scale deployment scenarios. I conducted systematic benchmarking across three key dimensions: concurrency handling (batch request processing), workload scalability (varying prompt complexity), and system stress testing. Using a quantized Mistral-7B-Instruct model on consumer hardware (RTX 4060, 8GB VRAM), the evaluation measured critical metrics including latency, throughput, GPU utilization, memory efficiency, and thermal performance under realistic production workloads.
The results demonstrate SGLang's superior average-case performance, maintaining 52-53 tokens/second throughput across increasing batch sizes while vLLM degraded to below 30 tokens/second at concurrency level 8. SGLang achieved 10% lower latency across all workload types, 96.1% GPU utilization compared to vLLM's 91.1%, and notably operated 15°C cooler despite higher hardware saturation. However, tail latency analysis revealed an important tradeoff: while SGLang dominated P50-P90 percentiles with 12.5% faster median response times, vLLM provided better worst-case guarantees at P95-P99, making it more suitable for strict SLA requirements. This analysis provides actionable insights for selecting appropriate serving infrastructure based on specific deployment constraints and performance priorities.
Read Full Report: https://drive.google.com/file/d/1UIe_7AvK6mnxy5LNU2SMtPWG5XpD2tnP/view?usp=sharing
DevDiary is a personal developer journal assistant that automates the logging, summarization, and reporting of Git-based activity across multiple projects. It helps developers effortlessly prepare stand-up updates, weekly retrospectives, and maintain clear work logs; all powered by open-source LLMs