Ashwin Biju Nair - Project- Diffusion Model

Building a Diffusion Model

Diffusion models generate images by progressively refining random noise to match a target concept, typically specified via a text prompt. This assignment implements a pipeline where the Score Distillation Sampling (SDS) loss from a pre-trained diffusion model is used to optimize images, 3D mesh textures, and neural radiance fields (NeRFs) so their outputs align with textual descriptions.

SDS Loss + Image Optimization

Implemented SDS loss to compute gradients that guide 2D image pixels toward a given text prompt, with the loss signal extracted from a pre-trained diffusion model. Both positive prompt guidance and unconditional (negative prompt) classifier-free guidance were integrated, following the structure of DreamFusion. Image optimization routines are handled in Q21_image_optimization.py, with loss computation in SDS.py. The results are shown in the GIFs below:

Text Prompt: a fire breathing dragon sitting on a heap of gold

Text Prompt: Spiderman swinging through New York

Texture Map Optimization for Mesh

Applied SDS loss to optimize per-vertex colors on a fixed 3D mesh, allowing the texture to learn to match arbitrary prompts. Rendered images from sampled viewpoints are compared to the diffusion model prompt, and ColorField outputs are iteratively updated. Mesh rendering and optimization are implemented in Q22_nerf_optimization.py using pytorch3d.

Inital Mesh

Text prompt: Black and white cow

Text prompt: Cow with tiger skin

< Projects

Home

Page updated

Google Sites

Report abuse