Local-Global INRs

Towards Croppable Implicit Neural Representations

This website is dedicated to providing an exploration of the qualitative results presented in our paper

*Videos and 3D shapes may take time to load, please give it a minute 😄*

Abstract

Video Encoding

Reducing Artifacts in Videos

Video Cropping

Spatial Cropping

Temporal Cropping

Spatial & Temporal Cropping

Bach Reconstructed Audio

Counting Reconstructed Audio

Abstract

Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.

Video Encoding

Ground Truth

12 second video, resolution: 300× 512× 512

PSNR 31.90

Local-Global SIRENs

Partition Factors

C0 = 5, C1 = 8, C2= 8

PSNR 30.83

SIREN-per-Partition

Partition Factors

C0 = 5, C1 = 8, C2= 8

Reducing Artifacts in Videos

Local-Global SIRENs manage to enhance the sharpness of the encoded videos, while also significantly reducing artifacts

Examples (notice the cat's whiskers)

Video Cropping

Based on our selected partition factors (C0 = 5, C1 = 8, C2= 8), we can crop the video in 2.4(s) intervals in the temporal dimension, and 64x64 pixels in the spatial dimension.

Number of parameters before cropping: 3.19M

Total number of partitions: 8×5×5=320

As presented in the paper, the global parameters take up 3.5% of the Local-Global SIREN