Title: AI for Disaster Management
Presenters:
Dr. Rakesh Chandra Balabantaray,
Asso. Prof and Dean(Academics), IIIT Bhubaneswar
Dr. Nayan Ranjan Paul
Asst. Professor, Silicon University, Bhubaneswar
Dr. Asutosh Bhoi
Asst. Professor, GITAM University, Visakhapatnam
• Introduction to Multimodal Classification with Humanitarian Categorization in Disaster Response (15 minutes):
This session provides an introduction to the task of classifying disaster-related social media posts into predefined humanitarian categories using both textual and visual data. Participants will explore the real-world importance of analyzing multimodal content (tweets and images) during natural or man-made disasters to identify critical information such as injured or missing persons, infrastructure damage, and calls for rescue or donation. The discussion will focus on the challenges of multimodal alignment, variability in visual-textual relevance, and the role of category-specific interpretation in high-stakes scenarios. By the end of this segment, attendees will understand how humanitarian classification contributes to faster and more effective disaster response and how multimodal AI systems can improve the accuracy and timeliness of such efforts.
• Python & Environment Setup (5 minutes):
A brief walkthrough to help participants set up the Python programming environment using Google Colab, including the installation of key libraries such as Hugging Face Transformers, Torch, OpenCV, and the import of relevant pretrained models and datasets (e.g., CrisisMMD).
• Q&A and Warm-Up (5 minutes):
This brief segment allows participants to clarify initial concepts, verify their setup environment, and engage in short discussion to ensure readiness for the hands-on sessions that follow.
• Understanding Humanitarian Categorization (10 minutes):
This session explores the core idea of humanitarian categorization, which involves assigning disaster-related social media content to predefined actionable categories such as injured or dead people, infrastructure damage, rescue efforts, or missing individuals. Participants will learn how these categories were derived (e.g., from CrisisMMD) and why they are critical for prioritizing aid, coordinating rescue operations, and filtering non-relevant content during emergencies. The discussion will cover challenges such as category overlap, ambiguity in visual-textual pairs, and the need for precision in time-sensitive crisis response systems.
• Real-World Examples (5 minutes):
This segment will present real examples of tweets and accompanying images labeled with humanitarian categories. These examples will highlight difficulties such as semantically misaligned text and image, ambiguous or overlapping labels, and non-humanitarian noise. Participants will gain a practical understanding of how classification models interpret such data, and how these interpretations influence decision-making in disaster response scenarios.
• Q&A and Reflection (5 minutes):
An interactive session for participants to raise questions, share their observations, and reflect on the importance of accurate categorization in real-world deployments. This is also an opportunity to connect theoretical understanding with the operational needs of humanitarian computing.
Session 3: Text and Image Preprocessing for Multimodal Classification (45 minutes)
• Preprocessing Disaster-Related Text Data (15 minutes):
This segment begins with a focus on the preprocessing of tweet text. Participants will learn how to remove noise such as URLs, hashtags, emojis, and user mentions. The session then introduces basic text normalization steps like lowercasing and punctuation handling, followed by tokenization. Participants will also learn how to generate meaningful vector representations of the cleaned text using pretrained transformer-based models like BERT, which capture contextual and semantic nuances necessary for disaster-related classification.
• Image Preprocessing and Feature Extraction (15 minutes):
In this segment, participants will work with the image modality. They will learn how to resize, normalize, and standardize input disaster-related images. Using pretrained CNN models such as ResNet, the session will walk through how to extract high-level features from images, which are then prepared for downstream multimodal modeling. The discussion will include how image features can complement textual signals, especially in categories like “vehicle damage” or “infrastructure collapse.”
• Aligning Text and Image Modalities (10 minutes):
Once both text and image features are processed, this part of the session shows how to align them as paired inputs to multimodal models. Participants will explore the structure of aligned datasets and how models expect text-image pairs to be formatted for joint learning and classification.
• Live Hands-On Coding in Google Colab (5 minutes):
The final segment will be a short walkthrough of the provided Colab notebook where participants will execute preprocessing code, generate embeddings, extract features, and build aligned inputs for future sessions.
• Understanding Fusion-Based Modeling Strategies (15 minutes):
This session begins with a conceptual overview of how textual and visual information can be combined for effective classification. Participants will learn about three widely used fusion strategies:
· Early Fusion, where BERT and ResNet features are concatenated at the embedding level before being passed to a classifier.
· Late Fusion, where text and image models operate independently and their predictions are merged at the decision layer.
· Hybrid Fusion, which combines intermediate features with multiple levels of interaction.
By the end of this segment, attendees will understand how fusion design impacts model performance, especially in the context of noisy or imbalanced modalities during disasters.
• Real-World Implementation with BERT and ResNet (20 minutes):
In this hands-on portion, participants will construct a multimodal model using pretrained BERT for processing tweet text and ResNet for image feature extraction. The two representations will be fused and passed through dense layers to predict one of eight humanitarian categories. Participants will train and evaluate the model on a labeled CrisisMMD subset, using precision, recall, and F1-score as evaluation metrics. The exercise highlights model construction, training loop management, and evaluation strategy in a disaster-response context.
• Q&A and Live Troubleshooting (10 minutes):
The session concludes with an open discussion where participants can clarify fusion strategies, ask technical questions about implementation, and troubleshoot issues in model performance. This segment provides an opportunity to review trade-offs between model complexity and efficiency and refine understanding of real-world deployment challenges.
• Manual Annotation with Doccano (15 minutes):
This session begins with a live demonstration of Doccano, an open-source annotation tool. Participants will learn how to use it to manually annotate tweet-image pairs with humanitarian labels such as infrastructure damage, injured individuals, or rescue efforts. The demo will guide attendees through uploading a dataset, configuring a multi-label annotation interface, and tagging both textual and visual inputs. This segment emphasizes the importance of high-quality labeled data for training robust multimodal classifiers.
• Label Ambiguity and Annotation Best Practices (15 minutes):
Participants will then explore the challenges of labeling multimodal content. Examples will include ambiguous tweets (e.g., vague text with unrelated images) and posts that could fall into multiple humanitarian categories. The session will provide best practices for consistent annotation, such as agreement scoring, inter-annotator consistency, and when to use fallback categories like Other Relevant Information. This discussion helps participants understand the subjectivity involved in human labeling and the need for clear guidelines.
• Error Analysis and Misclassification Insights (10 minutes):
To close the session, participants will evaluate model predictions against a custom-built test set. They will identify and discuss common misclassification patterns such as confusing “missing people” with “affected individuals” or false positives in non-humanitarian categories. Through this analysis, attendees will gain insight into model weaknesses and learn how annotation quality directly influences evaluation metrics.
• Research Opportunities in Humanitarian NLP and Computer Vision (10 minutes):
This closing session highlights emerging research directions at the intersection of NLP, computer vision, and humanitarian AI. Participants will explore open problems such as building context-aware models for crisis detection, developing cross-modal alignment techniques, and creating lightweight, deployable models suitable for field operations. The session also emphasizes the growing need for collaborative tools that can help integrate AI outputs with real-time decision-making in disaster relief environments.
• Challenges and Expansion Strategies (5 minutes):
Attendees will discuss unresolved challenges, including real-time model deployment under bandwidth and latency constraints, multilingual scalability across under-resourced languages, and mitigating misinformation and bias in crisis-related content. Suggestions will be provided for expanding datasets with more diverse disaster scenarios, underrepresented languages, and community-sourced annotations to improve model generalization.
• Participant Feedback and Q&A (5 minutes):
The workshop concludes with an interactive feedback round, allowing participants to share takeaways, ask final questions, and discuss how they plan to apply the tutorial’s methods in their own work. This segment fosters community exchange and encourages future collaboration and contribution to the field of AI for disaster management.