Publications

Journals

2016

Park S., Shim H.S., Chatterjee M., Sagae K., Morency L.P. Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia, ACM Transactions on Intelligent Interactive Systems (ACM TiiS) [pdf] [Project]

Conferences

2024

Liu X. , Paul S. , Chatterjee M. , Cherian A. CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments, AAAI Conference on Artificial Intelligence, Vancouver, Canada 2024 (AAAI 2024) [pdf] [Conference Link]

2022

Chatterjee M. , Ahuja N. , Cherian A. Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation, Advances in Neural Information Processing Systems, New Orleans, USA 2022 (NeurIPS 2022) [pdf] [Conference Link] [Project]
Harvill J. , Wani Y. , Chatterjee M. , Alam M. , Beiser D. , Chestek D., Hasegawa-Johnson M. , Ahuja N. Detection of COVID-19 from Joint Time and Frequency Analysis of Speech, Breathing, and Cough Audio, IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore 2022 (IEEE ICASSP 2022)[pdf] [Conference Link]

2021

Chatterjee M., Le Roux J., Ahuja N., Cherian A. Visual Scene Graphs for Audio Source Separation, International Conference on Computer Vision, Montreal, Canada 2021 (ICCV 2021) [pdf] [Conference Link] [Project]
Chatterjee M., Ahuja N., Cherian A. A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction, International Conference on Computer Vision, Montreal, Canada 2021 (ICCV 2021) (Oral) [pdf] [Conference Link] [Project]
Geng S., Gao P., Chatterjee M., Hori C., Le Roux J., Zhang Y., Li H., Cherian A. Dynamic Graph Representation Learning for Video Dialog via Multi-modal Shuffled Transformers, AAAI Conference on Artificial Intelligence, Virtual 2021 (AAAI 2021) [pdf] [Conference Link]

2020

Chatterjee M., Cherian A. Sound2Sight: Generating Visual Dynamics from Sound and Context, European Conference on Computer Vision, Glasgow, UK 2020 (ECCV 2020) [pdf] [Conference Link] [Project]

2018

Chatterjee M.*, Dubey A.*, Ahuja N. Coreset-Based Neural Network Compression, European Conference on Computer Vision, Munich, Germany 2018 (ECCV 2018) [pdf] [Conference Link] [Project] [* - Denotes Equal Contribution]
Chatterjee M., Schwing A.G. Diverse and Coherent Paragraph Generation from Images, European Conference on Computer Vision, Munich, Germany 2018 (ECCV 2018) [pdf] [Conference Link] [Project]

2016

Subhramaniam A., Chatterjee M., Mittal A. Deep Neural Networks with Inexact Matching for Person Re-Identification, Advances in Neural Information Processing Systems, Barcelona, Spain 2016 (NIPS 2016) [pdf] [Conference Link] [Project]

2015

Chatterjee M., Park S., Morency L.P., Scherer S. Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits, ACM International Conference on Multimodal Interfaces, Seattle, USA 2015 (ACM ICMI 2015) (Oral) (Outstanding Paper Award) [pdf] [Conference Link] [Project]
Chatterjee M., Leuski A. CRMActive: An Active Learning Based Approach for Effective Video Annotation and Retrieval, ACM International Conference on Multimedia Retrieval, Shanghai, China 2015 (ACM ICMR 2015) [pdf] [Conference Link] [Dataset] [Project]
Chatterjee M., Leuski A. A Novel Statistical Approach to Image and Video Retrieval and Its Adaptation for Active Learning, ACM International Conference on Multimedia, Brisbane, Australia 2015 (ACM MM 2015) [pdf] [Conference Link] [Project]
Shim H.S., Chatterjee M.*, Park S.*, Scherer S., Sagae K., Morency L.P. Acoustic and Paraverbal Indicators of Persuasiveness in Social Multimedia, IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia 2015 (IEEE ICASSP 2015) (Oral) [* - Equal contribution] [pdf] [Conference Link] [Project]

2014

Chatterjee M.*, Ghosh S*., Morency L.P. A Multimodal Context Based Approach for Distress Assessment, ACM International Conference on Multimodal Interfaces, Istanbul, Turkey 2014 (ACM ICMI 2014) [* - indicates equal contribution] [pdf] [Conference Link] [Project]
Chatterjee M., Stratou G., Scherer S., Morency L. P. Context-Based Signal Descriptors of Heart-Rate Variability for Anxiety Assessment, IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy 2014 (IEEE ICASSP 2014) [pdf] [Conference Link] [Project]
Park S., Chatterjee M.*, Shim H.S.*, Sagae K., Morency L.P. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach, ACM International Conference on Multimodal Interfaces, Istanbul, Turkey 2014 (ACM ICMI 2014) (Oral) [* - indicates equal contribution] [pdf] [Conference Link] [Project]

2012

Agarwal S., Chatterjee M., Mukherjee D.P. Recognizing Facial Expression Using a Novel Shape Motion Descriptor, Indian Conference on Computer Vision, Graphics and Image Processing, Mumbai, India 2012 (ICVGIP 2012) [pdf] [Conference Link] [Project]
Agarwal S., Chatterjee M., Mukherjee D.P. Synthesis of Emotional Expression Using a Novel Shape Motion Descriptor, Indian Conference on Computer Vision, Graphics and Image Processing, Mumbai, India 2012 (ICVGIP 2012) [pdf] [Conference Link] [Project]

Workshops

2023

Chatterjee M.*, Sharma M.*, Peng K.C., Lohit S., Jones M. Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection, ICCV Workshop on Representation Learning with Very Little Images, Paris, France 2023 (ICCVW 2023) (Oral) [* - indicates equal contribution] [pdf] [Conference Link]

2014

Chatterjee M., Park S.*, Shim H.S.*, Sagae K., Morency L.P. Verbal Behavior and Persuasiveness in Online Multimedia Content, COLING Workshop on Social NLP, Dublin, Ireland 2014 (COLING - SocialNLP 2014) (Oral) [* - indicates equal contribution] [pdf] [Conference Link] [Project]

Thesis

PhD: Efficient Audio-Visual Representations for Reasoning and Synthesis Tasks, [pdf]