Deep Learning Computational Graph Engineer at Intel, San Diego, USA
July 2018 – present
Working at Habana, enabling and optimizing models for Gaudi, Goya and Gaudi2 (2020 - present)
Developed a prototype onnxruntime backend for Goya, with features such as quantization
Optimize models such as hubert, Segnet, LLaMA text generation, yolo, etc on different platforms such as hugging face and deep speed, deploying techniques such as dynamic-to-static conversion, padding/bucketing inputs, employing Gaudi media pipeline, Hpu graphs etc.
Driving Dynamic shapes user experience:
Moderator of Habana Forum
Working at Nervana on the compiler front-end for Tensorflow to ngraph conversion (2018-2020)
As a core contributor to the opensource nGraph Tensorflow bridge repo, involved in architectural design and implementation of the codebase, which allows one to convert Tensorflow deep learning models to nGraph, Intel's graph compiler.
Worked on features such as improvements to deadness analysis, ahead-of-time compilation, sub graph encapsulation, debug tools, model equivalance test framework, build systems.
Enabled the running of multiple Tensorflow models using a deep understanding of graph algorithms, compiler theory, deep learning, C++ and Python
Intern (Autonomous Driving) at Nvidia Corporation, Santa Clara, USA
July 2012 – June 2014
Developed methods for estimating inference times of deep networks run using Tensor-RT on Drive-PX2
Developed testing and verification framework for integrating Tensorflow support in Tensor-RT
System Software Engineer at Nvidia Graphics Private Limited, Pune, India
July 2012 – June 2014
As part of the Android Video Team, developed driver and microcode for hardware modules relating to video capture, playback and security in the Tegra mobile processor.
Designed and implemented a technique for dynamically scaling the frequency of the Video Decoder Engine in Tegra SoC, to reduce power consumption when playing videos, without dropping frames.
Worked on microcode of Tegra Security Engine (a cryptography module) for decrypting encrypted Transport Stream files, stripping the Packetized Elementary Header of the payload and then reencrypting the resulting Elementary Stream.
Developed parser, driver & microcode for the new codec VP9 (both normal & encrypted playback).
Automated unit testing and developed an interactive mixed mode disassembly tool for debugging microcode
Intern at Instruments Research and Development Establishment, Dehradun, India
May 2011 – June 2011
Developed drivers for interfacing ATmega2560 with OLED display screens, interfaced a PC and the ATmega2560 using an UART connection and implemented software debouncing of input buttons. Integrating above components, designed and implemented a menu driven user interface for operating systems like Laser Designators.
Received an ‘outstanding’ performance rating for contributions.