CVPR 2026 WORKSHOP
Multimodal remote sensing has the potential to deliver comprehensive Earth observation by combining complementary sensor capabilities, yet fundamental challenges prevent this potential from being realized. While computer vision has made remarkable progress in multimodal learning with aligned, simultaneously-collected data (e.g., RGB-D cameras), remote sensing operates under far more challenging constraints. Satellites collect data asynchronously, at different resolutions, and through fundamentally different imaging physics. For example, synthetic aperture radar (SAR) actively transmits microwave pulses whereas electro-optical (EO) sensors passively capture reflected sunlight. These disparities create a critical gap between theoretical multimodal methods and practical Earth observation systems.
The core challenge lies not in sensor alignment or co-registration, but in learning meaningful representations across modalities that differ in their fundamental observational properties. When monitoring dynamic Earth processes, we rarely have the luxury of complete, synchronized observations. Instead, we must extract insights from whatever data is available: for example, pre-event optical imagery paired with post-event SAR, high-resolution commercial imagery combined with frequent but coarse images from public satellites, or clear-sky observations from weeks apart bracketing a critical cloudy period.
The goal of this workshop is to gather a wide audience of researchers in academia, industry, and related fields to address real-world constraints in multimodal remote sensing. While many recent multimodal remote sensing publications have focused on adapting computer vision algorithms to satellite imagery, fewer have tackled the unique challenges intrinsic to the remote sensing domain, such as irregular data collection intervals, disparities in modality and resolution, and non-ideal monitoring environments.
The workshop will solicit short papers applying machine learning to Earth and environmental science monitoring, particularly focused on multimodal learning under imperfect conditions. Topics will include, but will not be limited to:
Â
Multimodal fusion combining EO, SAR, LiDAR, and other sensors
Heterogeneous change detection across different modalities
Temporal analysis for event monitoring
Domain adaptation and cross-sensor generalization
Self-/unsupervised learning with limited data
Foundation models for remote sensing and Earth observation
Uncertainty quantification
Real-time multimodal satellite processing
Infrastructure monitoring and hazard prediction using incomplete data
Multimodal remote sensing analysis
Change detection and multi-temporal analysis
Geographic information science
Multimodal generative modeling
Multimodal representation learning
May 20 - Author notification
May 27 - Camera-ready deadline
We accept submissions of max 8 pages (excluding references) on the aforementioned and related topics. We encourage authors to submit 4-page works.
Submitted manuscripts should follow the CVPR 2026 paper template.
Accepted papers are not archival and will not be included in the proceedings of CVPR 2026.
Submissions will be rejected without review if they:
Contain more than 8 pages (excluding references)
Violate the double-blind policy or violate the dual-submission policy
Paper submission must contain substantial original contents not submitted to any other conference, workshop, or journal.
Papers will be peer-reviewed under a double-blind policy and need to be submitted online through the OpenReview submission website.
Miriam Cha, Technical staff, MIT Lincoln Laboratory
Gregory Angelides, Technical staff, MIT Lincoln Laboratory
Hamish Mitchell, Postdoctoral researcher, EAPS, MIT
Nathaniel Maidel, Master Sergeant, DAF MIT AI Accelerator
Sara Beery, Assistant professor, EECS, MIT
Taylor Perron, Professor, EAPS, MIT
Bill Freeman, Professor, EECS, MIT