Building a Diffusion Model
Diffusion models generate images by progressively refining random noise to match a target concept, typically specified via a text prompt. This assignment implements a pipeline where the Score Distillation Sampling (SDS) loss from a pre-trained diffusion model is used to optimize images, 3D mesh textures, and neural radiance fields (NeRFs) so their outputs align with textual descriptions.
SDS Loss + Image Optimization
Implemented SDS loss to compute gradients that guide 2D image pixels toward a given text prompt, with the loss signal extracted from a pre-trained diffusion model. Both positive prompt guidance and unconditional (negative prompt) classifier-free guidance were integrated, following the structure of DreamFusion. Image optimization routines are handled in Q21_image_optimization.py, with loss computation in SDS.py. The results are shown in the GIFs below:
Text Prompt: a fire breathing dragon sitting on a heap of gold
Text Prompt: Spiderman swinging through New York
Texture Map Optimization for Mesh
Applied SDS loss to optimize per-vertex colors on a fixed 3D mesh, allowing the texture to learn to match arbitrary prompts. Rendered images from sampled viewpoints are compared to the diffusion model prompt, and ColorField outputs are iteratively updated. Mesh rendering and optimization are implemented in Q22_nerf_optimization.py using pytorch3d.
Inital Mesh
Text prompt: Black and white cow
Text prompt: Cow with tiger skin