Single Image 3D Reconstruction

Yufan Liu, Peizhi Li, guided by Prof. James O'Brien

The Task

Single image to 3d reconstruction is quite an exciting field. We in this report, have explored and implemented a single view image to fast and high-resolution 3d reconstruction pipeline based off an open-source stable-dreamfusion baseline. We managed to achieve resolution reconstructions with satisfactory visual effects, fine detail and realistic texture.

The Task

Method Basics

Framework Overview

The framework is designed to produce high-quality object reconstructions using Neural Radiance Fields (NeRF) by incorporating an innovative guidance model, a kind of pre-trained model that's able to compute meaningful loss terms from multi-perspective sampling from the NeRF (Zero123). Meanwhile, the NeRF is trained on high quality labels i.e., the depth and normal estimations of the raw image.

Supervised Loss

Inside of the supervision process, the loss is calculated based on the difference between the model's predictions and ground truth for RGB values, mask values, normal vectors (cosine similarity between predicted and ground truth normals), and depth(Pearson correlation between predicted and ground truth depths). In DMtet fintuning mode, additional mesh-related losses (normal and laplacian) would be added.

Guidance Model

Zero1to3 serves as a guidance mechanism during the training of NeRF. It essentially provides additional cues or hints to the NeRF model, ensuring that the rendered outputs adhere multiple novel views rather than only the front view. In essence, zero1to3 is a pre-trained model that can take samples from NeRF from different camera poses, calculate the loss between sampled views and corresponding prediction based on the reference view. Specifically, zero1to3 is capable of working in the latent space. It doesn't predict what the current view should look like directly. Instead, it predicts adjustments (in the form of noise) to the latent representations, which is later used to guide the NeRF's training. This guidance is conditioned on the difference between the current view and multiple reference views.

Pipeline Basics

This project is built upon open source code base "stable-dreamfusion"

Page updated

Google Sites

Report abuse