This website is dedicated to providing an exploration of the qualitative results presented in our paper
*Videos and 3D shapes may take time to load, please give it a minute 😄*
Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at https://github.com/maorash/Local-Global-INRs.
Â
12 second video, resolution: 300× 512× 512
PSNR 31.90
Partition Factors
C0 = 5, C1 = 8, C2= 8
PSNR 30.83
Partition Factors
C0 = 5, C1 = 8, C2= 8
Local-Global SIRENs manage to enhance the sharpness of the encoded videos, while also significantly reducing artifacts
Examples (notice the cat's whiskers)
Based on our selected partition factors (C0 = 5, C1 = 8, C2= 8), we can crop the video in 2.4(s) intervals in the temporal dimension, and 64x64 pixels in the spatial dimension.
Number of parameters before cropping: 3.19M
Total number of partitions: 8×5×5=320
As presented in the paper, the global parameters take up 3.5% of the Local-Global SIREN
Video border
1.84M Parameters
Entire frames for 5(s)
1.96M Parameters
Parts of the frames for 5(s)
2.5M Parameters
Just for fun
1.65M Parameters
Training progression example for encoding a 512× 512 image. Each method is trained for 1k iterations.
Based on our selected partition factors (C0 = 16, C1 = 16), we can crop these images into 32x32 pixel partitions in the spatial dimension.
Number of parameters before cropping: ~199K
Total number of partitions: 16×16=256
As presented in the paper, the global parameters take up ~10% of the Local-Global SIREN
Each network contains ~200k parameters and is trained for roughly 20 epochs
IoU: 0.9904Â
C0 = 8, C1 = 8, C2= 8
IoU: 0.9920Â
C0 = 8, C1 = 8, C2= 8
IoU: 0.9909
Each method is trained for 1k iterations
Data Sources
Video and audio samples are taken from SIREN
3D shape (Lucy) data is taken from The Stanford 3D Scanning RepositoryÂ
RGB images are from the DIV2K dataset