Grounding and Enhancing

Grid-based Models for Neural Fields

To appear at CVPR24 as an oral presentation.

Best Paper Award Candidate

Zelin Zhao, Fenglei Fan, Wenlong Liao, Junchi Yan

Dr. Fenglei Fan is the corresponding author.

[PAPER] [YOUTUBE] [POSTER] [SLIDES]

ABSTRACT

  Many contemporary studies utilize grid-based models for neural field representation, but a systematic analysis of grid-based models is still missing, hindering the improvement of those models. Therefore, this paper introduces a theoretical framework for grid-based models. This framework points out that these models' approximation and generalization behaviors are determined by grid tangent kernels (GTK), which are intrinsic properties of grid-based models. The proposed framework facilitates a consistent and systematic analysis of diverse grid-based models. Furthermore, the introduced framework motivates the development of a novel grid-based model named the Multiplicative Fourier Adaptive Grid (MulFAGrid). The numerical analysis demonstrates that MulFAGrid exhibits a lower generalization bound than its predecessors, indicating its robust generalization performance. Empirical studies reveal that MulFAGrid achieves state-of-the-art performance in various tasks, including 2D image fitting, 3D signed distance field (SDF) reconstruction, and novel view synthesis, demonstrating superior representation ability.

Introduction

Neural fields are coordinate-based networks representing a field, a continuous parameterization representing a physical quantity of an object or a scene. These fields have demonstrated significant success in various tasks in computer vision and beyond:

Image regression

(x, y) RGB

SDF reconstruction

(x, y, z)  Signed distance

Novel view synthesis

(x, y, z)   RGB, density

We target on grid-based models.

A grid-based model takes a query coordinate x as input, which is sent to an index function U to acquire a set of feature vectors w from the grid. Then, the model outputs a weighted average of the kernel function 𝜑 and the feature vectors w.

Many state-of-the-art approaches adopt grid-based models to model neural fields, such as InstantNGP, NeuRBF, and NFFB.

The GTK theory

Our theoretical results show that the approximation and generalization performances of grid-based models are related to the grid-tangent kernel (GTK), which is defined as a positive semidefinite matrix in the following form:

Theorem 1 (informal version). When optimized by gradient descent, outputs of grid-based models evolve following an ordinary differentiable equation (ODE) related to the grid tangent kernel (GTK).

Theorem 2 (informal version). The GTK of a grid-based model stays stationary during training.

Theorem 3 (informal version). The generalization bound of a grid-based model is associated with the inverse of its GTK.

Model diagrams

(Left) The diagram of MulFAGrid. The input query coordinate is passed to the multiplicative filter to produce Fourier features and then sent to the normalization layer to compute the aggregation weights. 

(Right) The full architecture for neural radiance fields (NeRF). We obtain the densities via a MulFAGrid and a following activation function. For the colors c, we encode the position x via the MulFAGrid with an MLP to post-process the features. After that, we combine the queried spatial features with ray direction information to get color predictions.

Numerical results

Comparison curves of several grid-based models: InstantNGP, NFFB, NeuRBF, and ours. 

Left: Training curves of the image regression task on the Kodak dataset. 

Right: The evolution of the normal angular error (NAE) through training of the 3D SDF reconstruction task.

Our poster

Contacts

We welcome any discussions and questions about this work. One can reach Zelin Zhao at his email address: sjtuytc [at] gmail.com. Otherwise, one can also drop an email to Prof. Fenglei Fan.