Diffusion Meets Few-shot Class Incremental Learning [ArXiv'25] [paper]
*Note: Please refer to the full paper for extended experiments, ablations, and other details.
Who Should Read This Paper
Who Should Read This Paper
- Researchers and Practitioners in Continual Learning: Those investigating methods to reduce catastrophic forgetting while adapting to new classes with minimal data.
- Few-Shot Learning Enthusiasts: Individuals exploring ways to handle scarce data (e.g., 5-shot scenarios) across multiple incremental sessions.
- Generative Model & Vision Experts: Anyone interested in leveraging text-to-image diffusion backbones (like Stable Diffusion) for downstream classification or incremental learning tasks.
- Industry Teams Developing Scalable Incremental Solutions: Groups aiming to continually update large-scale models efficiently (e.g., in robotics, surveillance, or rapidly changing data domains).
What the Paper Covers
What the Paper Covers
- A New Perspective on FSCIL via Diffusion : We introduce Diffusion-FSCIL, a novel framework that uses a text-to-image diffusion model (Stable Diffusion) as a frozen backbone for Few-Shot Class-Incremental Learning (FSCIL). This approach breaks from traditional discriminative backbones (e.g., ResNet, ViT) by exploiting three core diffusion properties:
- Multi-Scale Feature Extraction: The UNet architecture in Stable Diffusion provides representations at various layers (coarse-to-fine).
- Generative Replay: Text-conditioned generative features (i.e., features synthesized purely from learned text prompts) allow class replay without storing real or synthesized images.
- Noise-Augmented Features: Even with very few samples, partial noise injection at intermediate diffusion steps diversifies representations and addresses overfitting.
Real-World Applications (From my perspective)
Real-World Applications (From my perspective)
- Continual Image Classification in Robotics: Robots operating in changing environments can learn novel objects on-the-fly without overhauling older knowledge.
- Medical Imaging & Diagnostics: As new rare conditions become available in small numbers, our approach can quickly adapt while preserving performance on existing classes.
- Surveillance & Security: Continually adapt to new threat categories or object types in video analytics with minimal incremental samples.
- Rapid Deployment in Evolving Domains: When data distribution shifts (e.g., e-commerce platforms with emerging product categories), Diffusion-FSCIL can incorporate new classes efficiently.
Key Strengths
Key Strengths
- Better Feature Diversity : Multi-scale feature extraction (inversion + generation) and text conditioning produce robust, complementary representations.
- Unlimited, Storage-Free Replay : Class-specific prompts generate replay features _without_ saving any real or synthetic images.
- Superior Incremental Rtention : Outperforms prior methods across classic FSCIL benchmarks (CUB-200, miniImageNet, CIFAR-100), preserving knowledge of old classes
- Lightweight & Modular :
- Only ~6M parameters are trained; the large diffusion backbone remains frozen.
- Only ~6M parameters are trained; the large diffusion backbone remains frozen.
- Allows fast adaptation, avoiding large memory overhead or complex replay buffers.
- Allows fast adaptation, avoiding large memory overhead or complex replay buffers.
- General Applicability : Our pipeline is broadly applicable to other incremental or open-world settings where generative flexibility can be leveraged
Main Figure (Overall architecture)
Main Figure (Overall architecture)