Title: AI for Disaster Management
Presenters:
Dr. Rakesh Chandra Balabantaray
Asso. Prof and Dean(Academics), IIIT Bhubaneswar
Dr. Nayan Ranjan Paul
Asst. Professor, Silicon University, Bhubaneswar
Dr. Asutosh Bhoi
Asst. Professor, GITAM University, Visakhapatnam
Overview of Humanitarian Categorization in Disaster Response:
Participants are introduced to the task of classifying disaster-related social media posts into actionable categories such as injured people, infrastructure damage, rescue efforts, and missing individuals. The session highlights the real-world importance of multimodal analysis (tweets + images) for faster decision-making and better allocation of resources during emergencies.
Challenges in Classification:
The discussion covers key issues such as label ambiguity, noise in social media data, semantically unrelated images, and category overlaps that make classification difficult.
Illustrative Real-World Examples:
Several tweet–image pairs from real disaster datasets (e.g., CrisisMMD) are shown to highlight the complexity of multimodal classification. Participants learn to visually and textually interpret the examples to understand their humanitarian relevance.
Python & Environment Setup:
A brief walkthrough to help participants set up the Python programming environment using Google Colab, including the installation of key libraries such as Hugging Face Transformers, Torch, OpenCV, and the import of relevant pretrained models and datasets (e.g., CrisisMMD).
Preprocessing Disaster-Related Text Data:
This segment begins with a focus on the preprocessing of tweet text. Participants will learn how to remove noise such as URLs, hashtags, emojis, and user mentions. The session then introduces basic text normalization steps like lowercasing and punctuation handling, followed by tokenization. Participants will also learn how to generate meaningful vector representations of the cleaned text using pretrained transformer-based models like BERT, which capture contextual and semantic nuances necessary for disaster-related classification.
Image Preprocessing and Feature Extraction:
In this segment, participants will work with the image modality. They will learn how to resize, normalize, and standardize input disaster-related images. Using pretrained CNN models such as ResNet, the session will walk through how to extract high-level features from images, which are then prepared for downstream multimodal modeling. The discussion will include how image features can complement textual signals, especially in categories like “vehicle damage” or “infrastructure collapse.”
Live Hands-On Coding in Google Colab:
The final segment will be a short walkthrough of the provided Colab notebook where participants will execute preprocessing code, generate embeddings, extract features, and build aligned inputs for future sessions.
Session 3: Building Multimodal Classification Models for Humanitarian Categorization (15 minutes)
Understanding Fusion-Based Modelling Strategies:
This session begins with a conceptual overview of how textual and visual information can be combined for effective classification. Participants will learn about three widely used fusion strategies:
Early Fusion, where BERT and ResNet features are concatenated at the embedding level before being passed to a classifier.
Late Fusion, where text and image models operate independently and their predictions are merged at the decision layer.
Hybrid Fusion, which combines intermediate features with multiple levels of interaction.
By the end of this segment, attendees will understand how fusion design impacts model performance, especially in the context of noisy or imbalanced modalities during disasters.
Real-World Implementation with BERT and ResNet:
In this hands-on portion, participants will construct a multimodal model using pretrained BERT for processing tweet text and ResNet for image feature extraction. The two representations will be fused and passed through dense layers to predict one of eight humanitarian categories. Participants will train and evaluate the model on a labeled CrisisMMD subset, using precision, recall, and F1-score as evaluation metrics. The exercise highlights model construction, training loop management, and evaluation strategy in a disaster-response context.
Manual Annotation with Doccano:
This session begins with a live demonstration of Doccano/Label Studio, an open-source annotation tool. Participants will learn how to use it to manually annotate tweet and image with humanitarian labels such as infrastructure damage, injured individuals, or rescue efforts. The demo will guide attendees through uploading a dataset, configuring a multi-label annotation interface, and tagging both textual and visual inputs. This segment emphasizes the importance of high-quality labeled data for training robust multimodal classifiers.
Research Opportunities:
Brief overview of ongoing research in multimodal humanitarian AI, including challenges like multilingual scalability, cross-modal alignment, and lightweight deployment.
Participant Feedback:
Participants share takeaways, raise final questions, and discuss how they plan to use these techniques in their own projects.