Work Experience

Member of Technical Staff, ML Performance Engineer, AMD Nov'24 - present

As part of the AI group at AMD, working on improving throughput (tokens/sec) of emerging LLM/SLM models on the AMD Instinct GPUs for a variety of framework (Pytorch, Megatron, JAX)

Senior Software Engineer, ML Compiler, SambaNova Systems Nov'22 - Oct'24

- Spearheaded mix precision effort focused on studying the impact of precision on different operators

and designing graph-level mix precision optimizations for an MLIR based machine learning compiler.

- Worked as compiler lead in bringing up customer critical models in the custom compiler stack,

ensuring models are successfully lowered to the accelerator with reasonable performance and accuracy.

- Working on efficient mapping of large ML models on the underlying hardware accelerator to

maximize performance/accuracy and also enhance robustness of the compiler stack.

Performance Modeling Intern, AMD, Austin, TX Summer'22

Worked on ILP Limit study and built an analyzer for performance modeling to study the impact of various micro-architectural features for a diverse range of workloads.

Workload Characterization Intern, Marvell Semiconductors, Marlborough, MA Summer'19

Studied bottlenecks of Vector Packet Processing (VPP) for Cavium processor.

R&D Engineer, Synopsys Inc, Bangalore, India 2015-2016

Developed an intelligent mode that fine-tunes various knobs to reduce the runtime of the Design Compiler.

Intern, IBM, Bangalore, India Summer'14

Developed Firmleak, a pre-silicon design technique for runtime leakage power characterization of processor.