Ajay Divakaran
Bio
Ajay Divakaran, Ph.D., is the Senior Technical Director of the Vision and Learning Lab at the Center for Vision Technologies, SRI International, Princeton. Divakaran has been a principal investigator for several SRI research projects for DARPA, IARPA, ONR etc. His work includes Multimodal Content Comprehension, Multimodal Conversation Understanding and Dialog Management, Multimodal Analytics for Social Media, Real-time Human Behavior assessment, and Event Detection. He has helped develop several innovative technologies for government and commercial multimodal systems such as Passio's MealScan feature for MyFitnessPal and Driver Drowsiness detection for Toyota. He worked at Mitsubishi Electric Research Labs during 1998-2008 where he was the lead inventor of the world's first sports highlights playback-enabled DVR, and several machine learning applications. Divakaran was named a Fellow of the IEEE in 2011 for his contributions to multimedia content analysis. He has authored two books, 130+ publications and 60+ issued patents. He received his Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute.
Links
SRI Webpage
https://www.sri.com/about/people/ajay-divakaran
SRI Dish
SRI's Multimodal Content Recommendation Commercialized by SRI Spin-off Vitrina
https://medium.com/dish/vitrina-ai-the-future-of-video-licensing-transactions-a4874355ce03
SRI Food Recognition Technology Commercialized by
SRI Spin-off Passio
https://blog.myfitnesspal.com/meal-scan/
SRI's Food Recognition Paper from 2015
https://pubmed.ncbi.nlm.nih.gov/25901024/
SRI Featured Innovator
https://medium.com/dish/featured-innovator-ajay-divakaran-adab82907ed
Ajay Divakaran talks about big data, social media influence, and robotic navigation
https://www.youtube.com/watch?v=y3H0hNAyFd0&feature=youtu.be
Linkedin Profile
https://www.linkedin.com/in/ajay-divakaran-3445361/
Google Scholar
dblp
https://dblp.uni-trier.de/pers/d/Divakaran:Ajay.html
Twitter
Longer Bio
Ajay Divakaran, Ph.D., is the Technical Director of the Vision and Learning Lab at the Center for Vision Technologies, SRI International, Princeton. Divakaran has been a principal investigator for several SRI research projects for DARPA, IARPA, ONR etc. His work includes multimodal social media analytics, vision and language, knowledge-guided machine learning, multimodal modeling and analysis of affective, cognitive, and physiological aspects of human behavior, interactive virtual reality-based training, tracking of individuals in dense crowds and multi-camera tracking, technology for automatic food identification and volume estimation, and analytics for event detection in open-source video. He has developed several innovative technologies for multimodal systems in both commercial and government programs during his career. Prior to joining SRI in 2008, Divakaran worked at Mitsubishi Electric Research Labs for 10 years, where he was the lead inventor of the world's first sports highlights playback-enabled DVR. He also oversaw a wide variety of product applications for machine learning. Divakaran was named a Fellow of the IEEE in 2011 for his contributions to multimedia content analysis. He developed techniques for recognition of agitated speech for his work on automatic sports highlights extraction from broadcast sports video. He established a sound experimental and theoretical framework for human perception of action in video sequences as lead-inventor of the MPEG-7 video standard motion activity descriptor. He serves on Technical Program Committees of key multimedia conferences and served as an associate editor of IEEE Transactions on Multimedia from 2007 to 2010. He currently serves on the editorial board of IEEE Intelligent Systems. He has authored two books and has more than 130 publications to his credit, as well as more than 60 issued patents. He has supervised four Ph.D. theses. He was a research associate at the ECE Dept, IISc from September 1994 to February 1995. He was a scientist with Iterated Systems Incorporated, Atlanta, GA, from 1995 to 1998. Divakaran received his M.S. and Ph.D. degrees in electrical engineering from Rensselaer Polytechnic Institute. His B.E. in electronics and communication engineering is from the University of Jodhpur in India, where he was a lecturer in 1985-86.
He has taught at multiple levels including second grade children (math), incoming college freshmen (math), EE undergrads (electronic circuits and control systems) and PhD students. He has an interest in special needs students. He is a fluent Japanese speaker and can speak survival French. He also speaks Hindi (native), Telugu, Tamil and Marwari in roughly descending order of fluency. He has been learning Hindustani vocal music from the prominent Hindustani vocalist, Mrs. Kumkum Sanyal, since 2003.
Selected Recent Publications
Pritish Sahu, Michael Cogswell, Sara Rutherford-Quach, Ajay Divakaran
Comprehension Based Question Answering using Bloom's Taxonomy
To Appear at the 6th Workshop on Representation Learning for NLP, 2021
Pritish Sahu, Karan Sikka, Ajay Divakaran
Towards Multimodal Comprehension
ICCV 2021
Yunye Gong, Xiao Lin, Yi Yao, Thomas G. Dietterich, Ajay Divakaran, Melinda Gervasio
Confidence Calibration for Cross-Domain Generalization under Covariate Shift (To appear at ICCV 2021)
Xiao Lin, Meng Ye, Yunye Gong, Giedrius Buracas, Nikoletta Basiou, Ajay Divakaran, Yi Yao
Modular Adaptation for Cross-Domain Few-Shot Learning
Arijit Ray, Michael Cogswell, Xiao Lin, Kamran Alipour, Ajay Divakaran, Yi Yao, Giedrius Burachas
Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness
Karan Sikka, Indranil Sur, Susmit Jha, Anirban Roy, Ajay Divakaran,
"Detecting Trojaned DNNs Using Counterfactual Attributions, " arXiv:2012.02275
Karan Sikka, Jihua Huang, Andrew Silberfarb, Prateeth Nayak, Luke Rohrer, Pritish Sahu, John Byrnes, Ajay Divakaran, Richard Rohwer , "Zero-Shot Learning with Knowledge Enhanced Visual Semantic Embeddings," arXiv:2011.10889
Meng Ye, Xiao Lin, Giedrius Burachas, Ajay Divakaran, Yi Yao, "Hybrid Consistency Training with Prototype Adaptation for Few-Shot Learning," Arxiv submission November 2020
https://arxiv.org/abs/2011.10082
4th Life Long Machine Learning Workshop at ICML 2020
Raghavan, A., Hostetler, J., Sur, I., Rahman, A., & Divakaran, A. (2020). Lifelong Learning using Eigentasks:Task Separation, Skill Acquisition, and Selective Transfer. 4th Lifelong Machine Learning Workshop, Proceedings of the 37th International Conference on Machine Learning (ICML), PMLR, 8.
Deep Adaptive Semantic Logic (DASL) : Compiling Declarative Knowledge into Deep Neural Networks
ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations.
Progressive Growing of Neural ODE's
WACV 2019
Pallabi Ghosh, Yi Yao, Larry S. Davis, Ajay Divakaran:
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation.
WACV poster presentation at 24:25
https://www.youtube.com/watch?v=zZDhauFsOUo
FoodX-251: A Dataset for Fine-grained Food Classification
"Brain to Brain" communications
Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks
MatchStax Multimodal Embedding API
Demo: Video Retrieval with MatchStax
Demo Video: Cross-Platform Retrieval with MatchStax (Twitter-Instagram)
SRI's Multimodal Content Recommendation Commercialized by SRI Spin-off Vitrina
https://medium.com/dish/vitrina-ai-the-future-of-video-licensing-transactions-a4874355ce03
EMNLP 2019
Multi-modal Document Intent in Instagram Posts
Demo Video
Demo Video: Multimodal Document Intent
ICCV 2019
Align2Ground: Weakly Supervised Phrase Grounding guided by Image-Caption Alignment
ECCV 2018
Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, Ajay Divakaran:
Zero-Shot Object Detection. ECCV (1) 2018: 397-414
ICMI 2015
https://drive.google.com/open?id=0B1TzavQVNsXGcVFmR3RBNy1QdWc
ICME 2015
https://drive.google.com/open?id=0B1TzavQVNsXGdmFLVzJ4SmNldlk
Some Pertinent Videos
Understanding Group Interactions in STEM
NSF Project led by Education Division in collaboration with Center for Vision Technology, SRI International
https://stemforall2021.videohall.com/presentations/2147
MatchStax Multimodal Embedding API
Video Retrieval with MatchStax
https://www.youtube.com/watch?v=NFmM4ZlMPTY
Cross-Platform (Instagram-Twitter) Retrieval with Matchstax
Ajay Divakaran talks about big data, social media influence, and robotic navigation
https://www.youtube.com/watch?v=y3H0hNAyFd0&feature=youtu.be
Xiao Lin talks about artificial intelligence and brain-to-brain data transfer
https://www.youtube.com/watch?v=NMDEs1DybXU
Jesse Hostetler talks about Lifelong Learning and getting Robots to dream
https://www.youtube.com/watch?v=S5Co1T_uuDE
Julia Kruk talks about the evolution of human communication through social media
https://www.youtube.com/watch?v=1yQ3-fN9HV4
Karan Sikka talks about embedding knowledge in machine learning models
https://www.youtube.com/watch?v=YPrXavrlqzs
Yunye Gong talks about applying physics-based models to deep learning
https://www.youtube.com/watch?v=lxdp6d_Ih94&t=318s
WACV poster presentation at 24:25
https://www.youtube.com/watch?v=zZDhauFsOUo
Amir Tamrakar : Incorporating Conversational AI in Automotive Applications
https://www.youtube.com/watch?v=bKJideEmyss
Multimodal Document Intent in Instagram Posts
MIBADemo January2015 FinalDraft
https://youtu.be/t_CbYo5ow04
DARPA M3I Multimodal Embedding for Social Media Analytics Demo
SRI Vision and Learning
Amir Tamrakar
News
SRI's Driver Monitoring System wins AutoTech Breakthrough Award in the Sensor Systems category
https://www.sri.com/announcements/sri-selected-as-autotech-breaktrhough-award-winner/
https://autotechbreakthrough.com/2020-winners/
https://scholar.google.com/citations?user=nBUpZ-EAAAAJ&hl=en
Recent Publications
Jihua Huang, Amir Tamrakar, "ACE-Net: Fine-Level Face Alignment through Anchors and Contours Estimation
HAI 2020
Sujeong Kim, David Salter, Luke DeLuccia, Amir Tamrakar. 2020. Study on Text-based and Voice-based Dialogue Interfaces for Human-Computer Interactions in a Blocks World. In Proceedings of the 8th International Conference on Human-Agent Interaction (HAI ’20), November 10–13, 2020, VirtualEvent, NSW, Australia. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3406499.3418754
Anirudh Som, Sujeong Kim, Bladmir Lopez-Prado, Svati Dhamija, Nonye Alozie, Amir Tamrakar, European Conference on Computer Vision (ECCV) Workshops, 2020
Alozie, N., Dhamija, S., McBride, E., & Tamrakar, A. (2020, June). Automated collaboration assessment using behavioral analytics. International Conference of the Learning Sciences (ICLS), Nashville, TN
Alozie, N., McBride, E., Dhamija, S., & Tamrakar, A. (2020, April). Collaboration Conceptual Model to Inform the Development of Machine Learning Models Using Behavioral Analytics. San Francisco, CA: American Education Research Association (AERA)
Incorporating Conversational AI in Automotive Applications
https://www.youtube.com/watch?v=bKJideEmyss
SMILEE developed under DARPA CwC
http://aclweb.org/anthology/N18-5018
This is the 2 minute short version for the actual submission: https://youtu.be/2iM5t7cpua0
This is the director’s cut aka longer version: https://youtu.be/hfE7j7PRWro
Aesop developed under DARPA CwC by Mohamed Amer
Aesop human-computer storytelling
Yi Yao
https://scholar.google.com/citations?user=iD6QaXcAAAAJ&hl=en
ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations.
Progressive Growing of Neural ODE's
http://arxiv.org/abs/2003.03695
WACV 2019
Pallabi Ghosh, Yi Yao, Larry S. Davis, Ajay Divakaran:
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation.
https://arxiv.org/pdf/1811.10575v1.pdf
WACV poster presentation at 24:25
https://www.youtube.com/watch?v=zZDhauFsOUo
NeurIPS 2019 (Work done by Mohamed Amer under DARPA XAI)
https://arxiv.org/abs/1905.02850
Karan Sikka
SRI's Multimodal Content Recommendation Commercialized by SRI Spin-off Vitrina
https://medium.com/dish/vitrina-ai-the-future-of-video-licensing-transactions-a4874355ce03
Karan Sikka talks about embedding knowledge in machine learning models
https://www.youtube.com/watch?v=YPrXavrlqzs
Xiao Lin
Xiao Lin talks about artificial intelligence and brain-to-brain data transfer
https://www.youtube.com/watch?v=NMDEs1DybXU
https://scholar.google.com/citations?user=zSIbUH4AAAAJ&hl=en
Xiao Lin, Meng Ye, Yunye Gong, Giedrius Buracas, Nikoletta Basiou, Ajay Divakaran, Yi Yao
Modular Adaptation for Cross-Domain Few-Shot Learning
Yunye Gong, Xiao Lin, Yi Yao, Thomas G. Dietterich, Ajay Divakaran, Melinda Gervasio
Confidence Calibration for Cross-Domain Generalization under Covariate Shift
https://filebox.ece.vt.edu/~linxiao/
Anirban Roy
https://scholar.google.com/citations?user=N9eSuR4AAAAJ&hl=en
Jesse Hostetler
Jesse Hostetler talks about Lifelong Learning and getting Robots to dream
https://www.youtube.com/watch?v=S5Co1T_uuDE
https://jhostetler.github.io/
https://scholar.google.com/citations?user=ngJLn9EAAAAJ&hl=en
Meng Ye
Meng Ye, Xiao Lin, Giedrius Burachas, Ajay Divakaran, Yi Yao, "Hybrid Consistency Training with Prototype Adaptation for Few-Shot Learning," Arxiv submission November 2020
https://arxiv.org/abs/2011.10082
https://scholar.google.com/citations?user=YMeDRE8AAAAJ&hl=en
https://sites.google.com/site/mengye1225/
Yunye Gong
Yunye Gong, Xiao Lin, Yi Yao, Thomas G. Dietterich, Ajay Divakaran, Melinda Gervasio
Confidence Calibration for Cross-Domain Generalization under Covariate Shift
Yunye Gong talks about applying physics-based models to deep learning
https://www.youtube.com/watch?v=lxdp6d_Ih94&t=318s
https://doerschuklab.bme.cornell.edu/people/yunye-gong/
https://www.linkedin.com/in/yunye-gong-192b5629
https://dblp.uni-trier.de/pers/hd/g/Gong:Yunye
Arijit Ray
https://filebox.ece.vt.edu/~ray93/
Arijit Ray, Michael Cogswell, Xiao Lin, Kamran Alipour, Ajay Divakaran, Yi Yao, Giedrius Burachas
Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness
Arijit Ray, Karan Sikka, Ajay Divakaran, Stefan Lee, Giedrius Burachas, Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), also at CVPR-W 2019 VQA and Visual Dialog Workshop, [arXiv], [bibTex] [Data]
Arijit Ray, Yi Yao, Rakesh Kumar, Ajay Divakaran, Giedrius Burachas, Can You Explain That: Lucid Explanations Help Human-AI Collaboratve Image Retrieval , 2019 AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2019)
Julia Kruk
Julia Kruk talks about the evolution of human communication through social media
https://www.youtube.com/watch?v=1yQ3-fN9HV4
EMNLP 2019
Multi-modal Document Intent in Instagram Posts
https://arxiv.org/abs/1904.09073
Demo Video
Some Former and Current Collaborators
Nick Vander Valk
https://dblp.org/pers/v/Valk:Nick_Vander.html
Mohamed Amer
Aswin Raghavan
https://sites.google.com/site/ashwinnr/home
Shih-Fu Chang
Uri Hasson
Daniel Jurafsky
VS Subrahmanian
Jure Leskovec
Regunathan Radhakrishnan
Ziyou Xiong
Lexing Xie
Kadir Peker
Huifang Sun
Anthony Vetro
Isao Otsuka
Ajay Divakaran's Music
Disciple of the prominent Hindustani vocalist Mrs. Kumkum Sanyal since 2003
https://www.youtube.com/user/ajaydivakaran
Raga Bageshri
https://www.youtube.com/watch?v=wi7Y-ULmhm8
Cafe Improv 2016
http://cafeimprov.weebly.com/best-of-cafe-improv-2016.html
Cafe Improv 2017
https://northofoxford.wordpress.com/2018/01/19/best-of-cafe-improv-2017/