Dheevatsa Mudigere

Distinguished Engineer, Nvidia

GPU compute architecture group, Accelerated Compute and Systems for AI

  • Previously:

    • Principal Research Scientist at AI System Co-design group, Infrastructure, Meta

    • Research Scientist at the Parallel Computing Lab, Intel Labs

    • Scientist at the Computing and Decision Sciences Lab, GE Global Research

  • MSc (Hons) in Computational Science & Engineering (CSE) from T.U München (TUM)

  • Bachelor degree in Mechanical Engineering from SJCE, Mysore

  • Contact: dheevatsa at mytum dot de;

Publications/Talks :


  • TopoOpt: Optimizing the Network Topology for Distributed DNN Training; Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, Anothony Kewitsch, Many Ghobadi; NSDI 23. arXiv:2202.00433


  • Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models; Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia et al.; Industry track at ISCA 22. arXiv:2104.05158

  • "Future Data Center Architectures and Increased Energy Efficiency”, industry panel at the IEE emerging technologies review, UCSB 2022. talk

  • Unity: Accelerating {DNN} Training Through Joint Optimization of Algebraic Transformations and Parallelization; Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, Alex Aiken; OSDI 22. paper

  • DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction; Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen; DLP-KDD 22. arXiv:2203.11014

  • EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table; Zheng Wang, Yuke Wang, Boyuan Feng, Dheevatsa Mudigere, Bharath Muthiah, Yufei Ding; SC 22.

  • Learning to Collide: Recommendation System Model Compression with Learned Hash Functions; Benjamin Ghaemmaghami, Mustafa Ozdal, Rakesh Komuravelli, Dmitriy Korchev, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov; Preprint on arXiv. arXiv:2203.15837

  • Supporting Massive DLRM Inference Through Software Defined Memory; Ehsan K Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Jens Axboe, Valmiki Rampersad, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector Yuen, Dheevatsa Mudigere, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Krishnakumar Nair, Maxim Naumov, Chris Petersen, Mikhail Smelyanskiy, Vijay Rao; ICDCS 22. arXiv:2110.11489

  • {Check-N-Run}: a Checkpointing System for Training Deep Learning Recommendation Models; Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Murali Annavaram, Krishnakumar Nair, Misha Smelyanskiy; NSDI 22. arXiv:2010.08679


  • "Challenges and Opportunities for AI HW@Scale", Exec talk at OCP Global Summit 2021. talk

  • "Co-designing HW/SW at Scale for Recommendation Systems", AI HW Summit 2021. link

  • “High-performance training of DLRMs”, Keynote at DLP-KDD 21. link

  • Differentiable NAS Framework and Application to Ads CTR Prediction; Ravi Krishna, Aravind Kalaiah, Bichen Wu, Maxim Naumov, Dheevatsa Mudigere, Misha Smelyanskiy, Kurt Keutzer; Preprint on arXiv. arXiv:2110.14812

  • Supporting Massive DLRM Inference Through Software Defined Memory; Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector Yuen, Shishir Juluri, Akshat Nanda, Manoj Wadekar, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov, Chris Peterson, Mikhail Smelyanskiy, Vijay Rao; Preprint on arXiv. arXiv:2110.11489

  • Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems; Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang; ISIT 21. arXiv:1909.11810


  • Building Recommender Systems with PyTorch; Dheevatsa Mudigere, Maxim Naumov, Joe Spisak, Geeta Chauhan, Narine Kokhlikyan, Amanpreet Singh, Vedanuj Goswami; Tutorial at KDD, 2020. link, talk

  • DLRM Workloads with Implications on Hardware and System Platforms; Maxim Naumov, Dheevatsa Mudigere; Executive talk at the OCP Global summit 2020. talk

  • Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems; Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang; Conference paper, research track at KDD, 2020; blog, PeRSonAl at ISCA 2020. arXiv:1909.02107

  • Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport; Saeed Rashidi, Pallavi Shurpali, Srinivas Sridharan, Naader Hassani, Dheevatsa Mudigere, Krishnakumar Nair, Misha Smelyanskiy, Tushar Krishna; Conference paper, at IEEE Symposium on High-Performance Interconnects (HotI), 2020. link, pdf

  • Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems; Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy; Preprint on arXiv. arXiv:2003.09518

  • The Architectural Implications of Facebook's DNN-based Personalized Recommendation; Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang; Conference paper at HPCA 2020. arXiv:1906.03109

  • Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy; Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takáč; Conference paper at AISTATS 2020. arXiv:1810.11507

  • RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing; Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang; Conference paper at BARC 2020. Preprint on arXiv. arXiv:1912.12953


  • Accelerator Fabric in Facebook Zion Training System; Whitney Zhao, Dheevatsa Mudigere, Xiaodong Wang, Jongsoo Park, John Kim, Mikhail Smelyanskiy; Invited talk at NOCS, 2019.

  • HW/SW Co-design for future AI platforms - Large memory unified training platform (Zion); Dheevatsa Mudigere, Whitney Zhao; OCP Regional Summit, 2019. link

  • Zion: Facebook Next-Generation Large-memory Unified Training Platform; Naader Hasani, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, Xiaodong Wang, Whitney Zhao (in alphabetical order); Hot Chips 31, 2019.

  • Deep Learning Recommendation Model for Personalization and Recommendation Systems; Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy; Preprint on arXiv. arXiv:1906.00091, GitHub, facebook AI blog

  • A Study of BFLOAT16 for Deep Learning Training; Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey; Preprint on arXiv. arXiv:1905.12322

  • SEERL: Sample Efficient Ensemble Reinforcement Learning; Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha and Bharat Kaul; Scaling-Up Reinforcement Learning (SURL) Workshop, IJCAI 2019 pdf; DeepRL Workshop, NeurIPS 2019.


  • Hierarchical Block Sparse Neural Networks; Dharma Teja Vooturi, Dheevatsa Mudigere, Sasikanth Avancha; Preprint on arXiv 2018. arXiv:1808.03420

  • A Progressive Batching L-BFGS Method for Machine Learning; Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, Ping Tak Peter Tang; Conference paper, long talk at ICML 2018. arXiv:1802.05374, optimization-online

  • Mixed Precision Training of Convolutional Neural Networks using Integer Operations; Dipankar Das*, Naveen Mellempudi*, Dheevatsa Mudigere*, Dhiraj Kalamkar*, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov; Conference paper at ICLR 2018. OpenReview, arXiv:1802.00930 (* equal contribution)

  • On Scale-out Deep Learning Training for Cloud and HPC; Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey; Poster at MLSys, 2018. arXiv:1801.08030

  • Ternary Residual Networks; Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey; Poster at MLSys, 2018. arXiv:1707.0467


  • Development of a Nodal DG Solver within the SU2 Framework; Edwin van der Weide, Jae hwan Choi, Dheevatsa Mudigere, Paul Urbanczyk, Juan J. Alonso; SU2 Developers Meeting, 2017.

  • RAIL-Risk Averse Learning; Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul; Deep Reinforcement Learning Symposium, NeurIPS 2017. arXiv:1707.06658

  • Ternary Neural Networks with Fine-Grained Quantization; Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey; Preprint on arXiv, 2017. arXiv:1705.01462

  • Performance Optimizations for the SU2 Higher-Order DG-FEM Fluid Solver on the Intel Xeon Phi (KNL); Edwin van der Weide, Thomas D. Economon, Juan J. Alonso, Jae hwan Choi, Dheevatsa Mudigere, Alexander Heinecke, Gaurav Bansal; Presented at SIAM-CSE 2017 MS44 Efficiency of High-Order Methods on the 2nd Generation Intel Xeon Phi Processor.

  • Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point; Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul; Preprint on arXiv, 2017. arXiv:1701.08978

  • On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima; Nitish. S. Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping Tak Peter Tang; Oral presentation and paper at ICLR 2017. OpenReview, poster, arXiv:1609.04836v1, code

  • Distributed Hessian-Free Optimization for Deep Neural Networks Xi He, Dheevatsa Mudigere, Mikhail Smelyanskiy, Martin Takáč; Distributed Machine Learning Workshop, AAAI 2017. AAAI'17 paper, arXiv:1606.00511


  • On Customized Computer Arithmetic for Deep Neural Network; Ping Tak Peter Tang, Naveen Mellempudi, Dheevatsa Mudigere. Intel Arithmetic Symposium, 2016.

  • Intel® Xeon Phi™ Delivers Competitive Performance For Deep Learning—And Getting Better Fast - Blog on IA (Xeon-Phi) coverage for Baidu's DeepBench benchmark.

  • Distributed Deep Learning Using Synchronous Stochastic Gradient Descent; Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey; Preprint on arXiv, 2016. arXiv:1602.06709

  • Performance optimizations for scalable implicit RANS calculations with SU2; Thomas D. Economon, Dheevatsa Mudigere, Gaurav Bansal, Alexander Heinecke, Francisco Palacios, Jongsoo Park, Mikhail Smelyanskiy, Juan J. Alonso, Pradeep Dubey; Journal of Computer & Fluids, February 2016. doi:10.1016/j.compfluid.2016.02.003


  • Computational Challenges and Optimization Techniques for CFD Applications on Modern Parallel Systems; Anand Deshpande, Dheevatsa Mudigere; Invited talk at the HiPC, 2015.

  • ImageNet ILSVRC 2015 Object classification/localization (CLS-LOC) submission; Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Natraj Jammalamadaka, Karthik Vaidyanathan; ILSVRC 2015. results

  • High-Performance Algebraic Multigrid Solver Optimized for Multi-Core Based Distributed Parallel Systems; Jongsoo Park, Mikhail Smelyanskiy, Ulrike Meier Yang, Dheevatsa Mudigere, Pradeep Dubey; International Conference for High Performance Computing, Networking, Storage, and Analysis SC, 2015. doi:10.1145/2807591.2807603

  • High-Performance, Modern Code Optimizations for Computational Fluid Dynamics - featured blog detailing the work on SU2 in collaboration with Aerospace Design Lab at Stanford, 2015.

  • Accelerating Computational Fluid Dynamics Code on Multi-/Many-Core Intel Platforms; Gaurav Bansal, Anand Deshpande, Paul Edwards, Alexander Heinecke, Michael Klemm, Dheevatsa Mudigere, Elmoustapha Ould-ahmed-vall, Mikhail Smelyanskiy, Michael Steyer, Nishant Agrawal, Ravi Ojha, Ambuj Pandey, Rihab Abdul Razak, Juan J. Alonso, Thomas D. Economon, Francisco Palacios, David Keyes; ParCFD, 2015. doi:, pdf,

  • Exploring Shared-memory Optimizations for an Unstructured Mesh CFD application on Modern Parallel Systems; Dheevatsa Mudigere, Srinivas Sridharan, Anand Deshpande, Jongsoo Park, Alexander Heinecke, Mikhail Smelyanskiy, Bharat Kaul, Pradeep Dubey, Dinesh Kaushik, and David Keyes; IPDPS, 2015. doi:10.1109/IPDPS.2015.114, pdf

  • Towards High-Performance Optimizations of the Unstructured Open-Source SU2 Suite; Thomas D. Economon, Francisco Palacios, Juan J. Alonso, Gaurav Bansal, Dheevatsa Mudigere, Anand Deshpande, Alexander Heinecke, and Mikhail Smelyanskiy; AIAA SciTech, 2015. Also at SIAM-CSE 2015 MS302 PDE-constrained Optimization using the Open-source Code SU2. doi:10.2514/6.2015-1949, pdf


  • Delayed difference scheme for large scale scientific simulations; Dheevatsa Mudigere, Sunil Sherlekar, Santosh Ansumali; Physical Review Letters, Volume 113, Issue 21, Nov 2014. doi:10.1103/PhysRevLett.113.218701, pdf.

  • Identification of Helicopter Dynamics based on Flight Data using Nature Inspired Techniques; S.N. Omkar, Dheevatsa Mudigere, Senthil K J, Vijaya Kumar M; International Journal of Applied Metaheuristic computing, 2014. doi:10.4018/ijamc.2015070102, arXiv:1411.3251.


  • Nature inspired optimization techniques for the design optimization of laminated composite structures using failure criteria; G.N.Naik, Dheevatsa Mudigere, S.N.Omkar, S.Gopalakrishna; Journal of Expert Systems with Applications, Volume 38, Issue 3, March 2011. doi:10.1016/j.eswa.2010.08.038


  • Fast Histograms using Adaptive CUDA Streams; Sisir Kopakka, Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan; HiPC, 2010. pdf.

  • Nearly Instantaneous Reconstruction for MRIs; Srihari Narashiman, Dheevatsa Mudigere, Babu Narayanan, Vijaya Saradhi; Invited talk at NVIDA GTC’ 2010. pdf, Talk.


  • Fast GPGPU Data Rearrangement Kernels using CUDA; Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan, Michael Bader, Hans‐Joachim Bungartz; HiPC, 2009. arXiv:1011.3583. - Best Presentation

  • Optimized CUDA Implementation of a Navier-Stokes based flow solver for the 2D Lid Driven Cavity; Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan, Michael Bader, Hans‐Joachim Bungartz; Poster at NVIDIA GPU Research Summit, NVIDIA GTC, 2009. pdf.

  • Data access optimized applications on the GPU using NVIDIA CUDA; Master's Thesis, Technische Universität München, 2009. pdf.


  • Multi-objective optimization of cement stabilization of soft soil using vector evaluated particle swarm optimization; S. Narendra, Dheevatsa Mudigere, P. Sivapullaiah, S.N. Omkar; International journal of Geomechanics and Geoengineering, 2008.

  • Nature Inspired Techniques for Identification of Helicopter Dynamics Based on Flight Data; Dheevatsa Mudigere, S.N. Omkar, Senthil K J, Vijaya Kumar M; proceedings of the Royal Aeronautical Society's 34th European Rotorcraft Forum, 2008.

  • Crop Classification using Biologically Inspired Techniques with High Resolution Satellite Image; S.N. Omkar, Senthilnath J, Dheevatsa Mudigere, Manoj Kumar M; Journal of Indian Society of Remote Sensing, Jun. 2008. doi:10.1007/s12524-008-0018-y

  • Identification of Helicopter Dynamics Based on Flight Data Using a PSO Driven Recurrent Neural Network Model; Dheevatsa Mudigere, S.N. Omkar, Vijay Kumar M; Proceedings of the 64th annual American Helicopter Society (AHS) forum, 2008.


  • Vector Evaluated Particle Swarm Optimization (VEPSO) for multi‐objective design optimization of composite structures; S.N. Omkar, Dheevatsa Mudigere, Narayana G Naik, Gopalakrishnan S; Computers & Structures, 06.004, 2007. doi:10.1016/j.compstruc.2007.06.004

  • Postural Assessment of Arbitrarily Taken Portrait and Profile Photographs Using ImageJ; S.N. Omkar, Manoj Kumar, Dheevatsa Mudigere; Journal of Bodywork and Movement Therapies, Volume 11,Issue 3, Jul. 2007. doi:10.1016/j.jbmt.2006.12.003, pdf

  • Urban Satellite Image Classification using Biologically Inspired Techniques; S.N. Omkar, Manoj K, Dheevatsa Mudigere, D. Muley; Proceedings of ISIE, IEEE International Symposium on Industrial Electronics, 2007. doi:10.1109/ISIE.2007.4374873, pdf.

  • Non‐Linear Dynamical System Identification Using Particle Swarm Optimization; S. N. Omkar, Dheevatsa Mudigere; Proceedings of the International Conference on Advances in Control and Optimization of Dynamical Systems (ACODS’2007), 2007. pdf - Best Paper

Other Random Stuff:

Few other things that make me tick are - gadgets, traveling and driving.