Junsu Kim - DiffusionFSCIL

Diffusion Meets Few-shot Class Incremental Learning [ArXiv'25] [paper]
*Note: Please refer to the full paper for extended experiments, ablations, and other details.

Who Should Read This Paper

Researchers and Practitioners in Continual Learning: Those investigating methods to reduce catastrophic forgetting while adapting to new classes with minimal data.
Few-Shot Learning Enthusiasts: Individuals exploring ways to handle scarce data (e.g., 5-shot scenarios) across multiple incremental sessions.
Generative Model & Vision Experts: Anyone interested in leveraging text-to-image diffusion backbones (like Stable Diffusion) for downstream classification or incremental learning tasks.
Industry Teams Developing Scalable Incremental Solutions: Groups aiming to continually update large-scale models efficiently (e.g., in robotics, surveillance, or rapidly changing data domains).

What the Paper Covers

A New Perspective on FSCIL via Diffusion : We introduce Diffusion-FSCIL, a novel framework that uses a text-to-image diffusion model (Stable Diffusion) as a frozen backbone for Few-Shot Class-Incremental Learning (FSCIL). This approach breaks from traditional discriminative backbones (e.g., ResNet, ViT) by exploiting three core diffusion properties:
1. Multi-Scale Feature Extraction: The UNet architecture in Stable Diffusion provides representations at various layers (coarse-to-fine).
2. Generative Replay: Text-conditioned generative features (i.e., features synthesized purely from learned text prompts) allow class replay without storing real or synthesized images.
3. Noise-Augmented Features: Even with very few samples, partial noise injection at intermediate diffusion steps diversifies representations and addresses overfitting.

Real-World Applications (From my perspective)

Continual Image Classification in Robotics: Robots operating in changing environments can learn novel objects on-the-fly without overhauling older knowledge.
Medical Imaging & Diagnostics: As new rare conditions become available in small numbers, our approach can quickly adapt while preserving performance on existing classes.
Surveillance & Security: Continually adapt to new threat categories or object types in video analytics with minimal incremental samples.
Rapid Deployment in Evolving Domains: When data distribution shifts (e.g., e-commerce platforms with emerging product categories), Diffusion-FSCIL can incorporate new classes efficiently.

Key Strengths

Better Feature Diversity : Multi-scale feature extraction (inversion + generation) and text conditioning produce robust, complementary representations.
Unlimited, Storage-Free Replay : Class-specific prompts generate replay features _without_ saving any real or synthetic images.
Superior Incremental Rtention : Outperforms prior methods across classic FSCIL benchmarks (CUB-200, miniImageNet, CIFAR-100), preserving knowledge of old classes
Lightweight & Modular :

- Only ~6M parameters are trained; the large diffusion backbone remains frozen.

- Allows fast adaptation, avoiding large memory overhead or complex replay buffers.

General Applicability : Our pipeline is broadly applicable to other incremental or open-world settings where generative flexibility can be leveraged

Who Should Read This Paper

Researchers and Practitioners in Continual Learning: Those investigating methods to reduce catastrophic forgetting while adapting to new classes with minimal data.

Few-Shot Learning Enthusiasts: Individuals exploring ways to handle scarce data (e.g., 5-shot scenarios) across multiple incremental sessions.

Generative Model & Vision Experts: Anyone interested in leveraging text-to-image diffusion backbones (like Stable Diffusion) for downstream classification or incremental learning tasks.

Industry Teams Developing Scalable Incremental Solutions: Groups aiming to continually update large-scale models efficiently (e.g., in robotics, surveillance, or rapidly changing data domains).

What the Paper Covers

Multi-Scale Feature Extraction: The UNet architecture in Stable Diffusion provides representations at various layers (coarse-to-fine).

Generative Replay: Text-conditioned generative features (i.e., features synthesized purely from learned text prompts) allow class replay without storing real or synthesized images.

Noise-Augmented Features: Even with very few samples, partial noise injection at intermediate diffusion steps diversifies representations and addresses overfitting.

Real-World Applications (From my perspective)

Continual Image Classification in Robotics: Robots operating in changing environments can learn novel objects on-the-fly without overhauling older knowledge.

Medical Imaging & Diagnostics: As new rare conditions become available in small numbers, our approach can quickly adapt while preserving performance on existing classes.

Surveillance & Security: Continually adapt to new threat categories or object types in video analytics with minimal incremental samples.

Rapid Deployment in Evolving Domains: When data distribution shifts (e.g., e-commerce platforms with emerging product categories), Diffusion-FSCIL can incorporate new classes efficiently.

Key Strengths

Better Feature Diversity : Multi-scale feature extraction (inversion + generation) and text conditioning produce robust, complementary representations.

Unlimited, Storage-Free Replay : Class-specific prompts generate replay features _without_ saving any real or synthetic images.

Superior Incremental Rtention : Outperforms prior methods across classic FSCIL benchmarks (CUB-200, miniImageNet, CIFAR-100), preserving knowledge of old classes

Lightweight & Modular :

- Only ~6M parameters are trained; the large diffusion backbone remains frozen.

- Allows fast adaptation, avoiding large memory overhead or complex replay buffers.

General Applicability : Our pipeline is broadly applicable to other incremental or open-world settings where generative flexibility can be leveraged

Main Figure (Overall architecture)