Mehrdad Farajtabar's Website

I'm currently a senior research manager at Apple, leading a team of engineers and scientists to understand and improve reasoning and planning capabilities of large language models (LLMs). My goal is to close the reasoning and intelligence gap between frontier models and the genuine reasoning (General Intelligence) that humans possess. I also work on optimizing LLMs to run on-device and have efficient inference. In general, I like to work on understanding and demystifying how large vision and language models work and learn, in order to find more accurate and efficient pre-training and fine-tuning architectures, algorithms, and strategies.

Before that, I was a senior research scientist at DeepMind. As a research scientist, I worked on continual and lifelong learning, multitask and transfer learning, understanding the training dynamics of deep neural networks, and reinforcement learning. These research areas were in line with DeepMind's mission towards Artificial General Intelligence. As an applied scientist and engineer, I worked on applications of machine learning, e.g., using recommendation/predictive models, meta learning, causal inference, and reinforcement learning to improve Google's products such as YouTube, Cloud, and Sales.

I received my Ph.D. in Computational Science and Engineering from Georgia Institute of Technology, under the supervision of Hongyuan Zha and Le Song. I used to work on modeling and optimization of sequential events data, stochastic point processes, and dynamics of and on the networks. During my PhD I interned at DeepMind, Micorosft Research, Max Planck Instintute for Software Systems and Google, working on predicting and leveraging health data, information reliability, and analyzing google maps local listings data. I received my M.Sc. in Artificial Intelligence from the Computer Engineering Department at Sharif University of Technology and my B.Sc. from the same university in Software Engineering, in 2011 and 2009, respectively.

Google Scholar LinkedIn Twitter

Prospective Candidates: Our group may have openings for exceptional candidates on LLM reasoning, planning, agentic LLMs, tool use and LLM inference efficiency! If you have prior hands-on experience and proven track of related research papers at top conferences on the above topics don't hesitate to reach out via my last name at apple dot com!

Work Experiences

Senior Manager of Research (2022 - Present)

Apple, Seattle, Washington
Applied Research and Innovation in Large Language and Vision Models

Senior Research Scientist (2018 - 2022)

DeepMind, Mountain View, California
Applied Research on Deep Learning and Reinforcement Learning

Recent Interests

Large Language Models, Inference Efficiency, LLM Reasoning, Planning and Generalization
Vision‐Language Models, Foundation Models, Efficient Model Training
Continual and Lifelong Learning, Multitask and Transfer Learning, Meta Learning

Education

PhD in Computational Science and Engineering
- Georgia Institute of Technology
- 2013 - 2018
MSc in Computational Science and Engineering
- Georgia Institute of Technology
- 2013 - 2016
MSc in Artificial Intelligence
- Sharif University of Technology
- 2009 - 2011
BSc in Software Engineering
- Sharif University of Technology
- 2005 - 2009

Internship and Miscellaneous Industry experiences

DeepMind, Mountain View, CA
- Research Intern
- Oct 2017 - Dec 2017
Microsoft Research, Redmond, WA
- Research Intern
- May 2016 - Aug 2016
Max-Planck Institute, Kaiserslautern, Germany
- Research Intern
- May 2015 - Aug 2015
Google, Mountain View, CA
- Software Engineering Intern
- May 2014 - Aug 2014

Professional Services

Area Chair

ICLR 2024, 2025
NeurIPS 2025
CVPR 2024

Program Committee/Reviewer

I've occasionally reviewed for the following conferences, workshops, and journals in the past:

NeurIPS, AISTATS, ICML, ICLR, UAI, WWW, AAAI, WSDM, ASONAM, IJCAI, CVPR, CoLLAs
IEEE Transactions on Knowledge and Data Engineering, The Computer Journal, IEEE Transactions on Neural Networks and Learning Systems, ACM Transaction on the Web, Machine Learning Journal

Rcent Papers

Please refer to my Google Scholar for the updated list of my publications.

Papers from 2018 to 2023

LLM in a flash: Efficient Large Language Model Inference with Limited Memory [arXiv]
K Alizadeh, I Mirzadeh, D Blenko, K Khatamifard, M Cho, CCD Mundo, M Rastegari, M Farajtabar
arXiv:2312.11514, 2023
Weight Subcloning: Directly Initializing Transformers using Larger Pretrained Models [arXiv]
M Samragh, M Farajtabar, F Faghri, R Vemulapalli, S Mehta, O Tuzel, D Naik, M Rastegari
arXiv:2312.09299, 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models [arXiv]
I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo, O Tuzel, G Samei, M Rastegari, M Farajtabar
NeurIPS workshop on Efficient Natural Language and Speech Processing, 2023
TiC-CLIP: Continual Training of CLIP Models [arXiv]
S Garg, M Farajtabar, H Pouransari, R Vemulapalli, S Mehta, O Tuzel, F Faghri
NeurIPS Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2023
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement [arXiv]
M Salehi, M Farajtabar, M Horton, F Faghri, H Pouransari, R Vemulapalli, A Farhadi, O Tuzel, M Rastegari, S Mehta
NeurIPS Workshop on Unifying Representations in Neural Models, 2023
Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models [arXiv]
R Vemulapalli, H Pouransari, F Faghri, S Mehta, M Farajtabar, M Rastegari, O Tuzel
arXiv preprint arXiv:2311.18237, 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding [arXiv]
H Wang, PKA Vasu, F Faghri, R Vemulapalli, M Farajtabar, S Mehta, M Rastegari, O Tuzel, H Pouransari
NeurIPS Workshop on Unifying Representations in Neural Models, 2023
On the Efficacy of Multi-scale Data Samplers for Vision Applications [arXiv]
E Nunez, T Merth, A Prabhu, M Farajtabar, M Rastegari, S Mehta, M Horton
arXiv preprint arXiv:2309.04502, 2023
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement [arXiv]
F Faghri, H Pouransari, S Mehta, M Farajtabar, A Farhadi, M Rastegari, O Tuzel
The IEEE International Conference on Computer Vision (ICCV), 2023
An empirical study of implicit regularization in deep offline RL [arXiv]
C Gulcehre, S Srinivasan, J Sygnowski, G Ostrovski, M Farajtabar, M Hoffman, R Pascanu, A Doucet
Transactions on Machine Learning Research (TMLR), 2022
Architecture Matters in Continual Learning [arXiv]
I Mirzadeh, A Chaudhry, D Yin, T Nguyen, R Pascanu, D Gorur, M Farajtabar
arXiv preprint arXiv:2309.04502, 2022
Efficient Continual Learning in Neural Network Subspaces [arXiv]
T Doan, I Mirzadeh,J Pineau, M Farajtabar
Conference on Lifelong Learning Agents (COLLAs), 2023
Wide Neural Networks Forget Less Catastrophically [arXiv]
I Mirzadeh, A Chaudhry, D Yin, H Hu, R Pascanu, D Gorur, M Farajtabar
International conference on Machine Learning (ICML), 2022
Linear Mode Connectivity in Multitask and Continual Learning [paper] [code] [video]
I Mirzadeh*, M Farajtabar*, D Gorur, R Pascanu, H Ghasemzadeh
International Conference on Learning Representations (ICLR), 2021
Balance Regularized Neural Network Models for Causal Effect Estimation [arXiv] [slides] [video]
M Farajtabar, A Lee, Y Feng, V Gupta, P Dolan, H Chandran, M Szummer
Causal Discovery & Causality-Inspired Machine Learning Workshop (NeurIPS), 2020
Understanding the Role of Training Regimes in Continual Learning [paper] [arXiv] [slides] [code] [video]
I Mirzadeh, M Farajtabar, R Pascanu, H Ghasemzadeh
Neural Information Processing Systems (NeurIPS), 2020
Self-distillation Amplifies Regularization in Hilbert Space [paper] [arXiv] [video]
H Mobahi, M Farajtabar, PL Bartlett
Neural Information Processing Systems (NeurIPS), 2020
A Maximum-entropy Approach to Off-policy Evaluation in Average-reward MDPs [paper] [arXiv]
N Lazic, D Yin, M Farajtabar, N Levine, D Gorur, C Harris, D Schuurmans
Neural Information Processing Systems (NeurIPS), 2020
Learning to Incentivize Other Learning Agents [paper] [arXiv] [code]
J Yang, A Li, M Farajtabar, P Sunehag, E Hughes, H Zha
Neural Information Processing Systems (NeurIPS), 2020
Orthogonal Gradient Descent for Continual Learning [paper] [arXiv] [slides]
M Farajtabar, N Azizan, A Mott, A Li
The International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Task-agnostic Continual Learning with Hybrid Probabilistic Models
P Kirichenko, M Farajtabar, D Rao, B Lakashminarayanan, N Levine, A Li, H Hu, A Wilson, R Pascanu
INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, (ICML), 2021
The Effectiveness of Memory Replay in Large Scale Continual Learning [arXiv]
Y Balaji, M Farajtabar, D Yin, A Mott, A Li
Workshop on Continual Learning in Computer Vision, (CVPR), 2021
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint [arXiv]
D Ying, M Farajtabar, A Li, N Levine, A Mott Workshop on Continual Learning, (ICML), 2021
Adapting Auxiliary Losses using Gradient Similarity [arXiv]
Y Du, WM Czarnecki, SM Jayakumar, M Farajtabar, R Pascanu, B Lakshminarayanan
arXiv preprint arXiv:1812.02224, 2021
Dropout as an Implicit Gating Mechanism for Continual Learning [paper] [code] [video]
I Mirzadeh, M Farajtabar, H Ghasemzadeh
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020
Improved Knowledge Distillation via Teacher Assistant [paper] [arXiv] [code] [poster] [video]
I Mirzadeh*, M Farajtabar*, A Li, N Levine, A Matsukawa, H Ghasemzadeh
The AAAI Conference on Artificial Intelligence (AAAI), 2020
Dyrep: Learning Representations over Dynamic Graphs [paper]
R Trivedi, M Farajtabar, P Biswal, H Zha
The International Conference on Learning Representations (ICLR), 2019
Cross-View Policy Learning for Street Navigation [paper]
A Li, H Hu, P Mirowski, M Farajtabar
The IEEE International Conference on Computer Vision (ICCV), 2019
Learning Time Series Associated Event Sequences with Recurrent Point Process Networks. [paper]
S Xiao, J Yan, M Farajtabar, L Song, X Yang, H Zha
IEEE transactions on neural networks and learning systems, 2019
Modeling Behaviors and Lifestyle with Online and Social Data for Predicting and Analyzing Sleep and Exercise Quality. [paper]
M Farajtabar, E Kıcıman, G Nathan, RW White
International Journal of Data Science and Analytics, 2019
More Robust Doubly Robust Off-policy Evaluation [paper]
M Farajtabar*, Y Chow*, M Ghavamzadeh
International conference on Machine Learning (ICML), 2018

Older Publications (PhD and pre-PhD)

Conference

Discrete Interventions in Hawkes Processes with Applications in Invasive Species Management
A. Gupta, M. Farajtabar, B. Dilkina and H. Zha
International Joint Conference on Artificial Intelligence, (IJCAI-ECAI ), 2018
Learning Conditional Generative Models for Temporal Point Processes
S. Xiao, H. Xu, J. Yan, M. Farajtabar, X. Yang, L. Song, H. Zha
AAAI Conference on Artificial Intelligence, (AAAI), 2018
Wasserstein Learning of Deep Generative Point Process Models
S. Xiao*, M. Farajtabar*, X. Ye, J. Yan, L. Song, H. Zha
Neural Information Processing Systems ( NIPS ), 2017, Long Beach, CA, USA. * denotes equal contribution!
Fake News Mitigation via Point Processes Based Intervention
M. Farajtabar, J. Yang, X. Ye, R. Trivedi, E. Khalil, S. Li, H. Xu, L. Song, H. Zha
International conference on Machine Learning (ICML), 2017, Sydney, Australia.
Recurrent poisson factorization for temporal recommendation
S. A. Hosseini, K. Alizadeh, A. Khodadadi, A. Arabzadeh, M. Farajtabar, H. Zha, H. R. Rabiee
International Conference on Knowledge Discovery and Data Mining (KDD), 2017, Halifax, Canada.
Distilling Information Reliability and Source Trustworthiness from Digital Traces
B. Tabibian, I. Valera, M. Farajtabar, L. Song and B. Schoelkopf, M. Gomez-Rodriguez
World Wide Web Conference (WWW), 2017, Perth , Australia.
Correlated Cascades: Compete or Cooperate
A. Zarezade, A. Khodadadi, M. Farajtabar, H. R. Rabiee, L. Song, and H. Zha
AAAI Conference on Artificial Intelligence (AAAI), 2017, San Francisco, USA.
Multi-stage Campaigning in Social Networks
M. Farajtabar, X. Ye, S. Harati, L. Song, H. Zha
Neural Information Processing Systems (NIPS), 2016, Barcelona, Spain.
Smart broadcasting: Do you want to be seen?
M. Karimi, E. Tavakoli, M. Farajtabar, L. Song, M. Gomez-Rodriguez.
International Conference on Knowledge Discovery and Data Mining (KDD), 2016, San Francisco, USA.
Learning Granger Causality for Hawkes Processes
H. Xu, M. Farajtabar and Hongyuan Zha
International conference on Machine Learning (ICML), 2016, New York, USA.
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
. M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, Hongyuan Zha, L. Song
Neural Information Processing Systems (NIPS), 2015, Montreal, Quebec, Canada.
Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams
N. Du, M. Farajtabar, A. Ahmed, A. J. Smola, L. Song
International Conference on Knowledge Discovery and Data Mining (KDD), 2015, Sydeny, Australia.
Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods
A. Shaban, M. Farajtabar, B. Xie, L. Song, B. Boots
The Conference on Uncertainty in Artificial Intelligence (UAI), 2015, Amsterdam, Netherlands.
Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades
M. Farajtabar, M. Gomez-Rodriguez, N. Du, M. Zamani, H. Zha, L. Song
International Conference on Artificial Intelligence and Statistics (AISTATS), 2015, San Diego, CA, USA.
Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods
A. Shaban, M. Farajtabar, B. Xie, L. Song, B. Boots
Workshop on Non-convex Optimization for Machine Learning: Theory and Practice (NIPS) 2015, Montreal, Quebec, Canada.
Co-evolutionary Dynamics of Information Diffusion and Network Structure
M. Farajtabar, M. Gomez-Rodriguez, Y. Wang, S. Li, H. Zha, L. Song.
Workshop on Activity and Events in Networks: Models, Methods Applications (WWW) 2015, Florence, Italy.
NetCodec: Community Detection from Individual Activities
T. Q. Long, M. Farajtabar, L. Song, H. Zha
SIAM Conference on Data Mining (SDM), 2015, Vancouver, British-Columbia, Canada.
Shaping Social Activity by Incentivizing Users
M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, L. Song
Neural Information Processing Systems (NIPS), 2014, Montreal, Quebec, Canada.
The Network You Keep: Analyzing Persons of Interest Through Network Decomposition
S. Shokat-Fadaee, M. Farajtabar, R. Sundaram, J. A. Aslam
EEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2014, Beijing, China.
From Local Similarity to Global Coding; An Application to Image Classification
A. Shaban, H. R. Rabiee, M. Farajtabar, M. Ghazvininejad
Computer Vision and Pattern Recognition (CVPR), 2013, Portland, Oregon, USA.
Online Object Representation Learning and it's Application to Object Tracking
A. Shaban, H. R. Rabiee, M. Farajtabar, M. Fadaee
Spring Symposium on Lifelong Machine Learning (AAAI), 2013, Stanford, CA, USA.
Manifold Coarse Graining for Online Semi-supervised Learning
M. Farajtabar, A. Shaban, H. R. Rabiee, M. H. Rohban
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2011, Athens, Greece
Efficient Iterative Semi-supervised Classification on Manifold
M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani
Workshop on Optimization Based Methods for Emerging Data Mining Problems, in conjunction with International Conference on Data Mining (ICDM) , 2011, Vancouver, British-Columbia, Canada.
The Inefficiency of Equilibria in a Network Creation Game with Packet Forwarding
M. Fazli, K. Khodamoradi, M. Farajtabar, M. Ghazvininejad, M. Ghodsi
International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), 2009, Czech Republic.

Journal

An empirical study of implicit regularization in deep offline RL
C Gulcehre, S Srinivasan, J Sygnowski, G Ostrovski, M Farajtabar, M Hoffman, R Pascanu, A Doucet
Transactions on Machine Learning Research (TMLR), 2022
Learning time series associated event sequences with recurrent point process networks
S Xiao, J Yan, M Farajtabar, L Song, X Yang, H Zha
IEEE transactions on neural networks and learning systems, 2019
Modeling behaviors and lifestyle with online and social data for predicting and analyzing sleep and exercise quality
M Farajtabar, E Kıcıman, G Nathan, RW White
International Journal of Data Science and Analytics, 2019
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
M. Farajtabar, M. Gomez-Rodriguez, Y. Wang, S. Li, H. Zha, L. Song
The Web Conference, Journal Track, 2018
On The Network You Keep: Analyzing Persons of Interest using Cliqster
S. Shokat-Fadaee, M. Farajtabar, R. Sundaram, J. A. Aslam, N. Passas
Social Network Analysis and Mining, 2015, Montreal, Quebec, DOI: 10.1007/s13278-015-0302-0.
Detecting Weak Changes in Dynamic Events over Networks
S. Li, Y. Xie, M. Farajtabar M, A. Verma, L. Song
IEEE Transactions on Signal and Information Processing over Networks, 2017
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
M. Farajtabar, M. Gomez-Rodriguez, Y. Wang, S. Li, H. Zha, L. Song
Journal of Machine Learning Research (JMLR) , 2017