Effective ways to construct neural networks to sustain group equivariance (or invariance). This is useful when dealing with physical models.
Deep neural network coarsening, given a trained network, how to provide an efficient algorithm to construct a smaller (width, depth, flops) network approximation. The error can be measured in terms of operator norm.
General theory for inferring neural network structure and weights from full mapping (input/output). It has been proved for feed-forward networks with analytic activation functions from the wave front set (singularities). A general theory is still absent.
Using neural networks (e.g., Feed-forward) to approximate a rank-one two-variable function f(x) g(y), is it possible to prove the convergence of training in any sense? The intuition comes from single variable function approximation works well if using more than 2 layers. [The loss function should be modified.]
Is it possible to sketch the landscape by estimating the average distance between two local minima in a probability sense?
In the 1D ReLU network approximation, the Gram matrix has a fast decay of eigenvalues, which means the training will be slow after a short while if following the gradient flow. Is there an argument to say the parameter cannot move too far? [The decay actually is 4th order decay and can be explicitly found]
For n dimensions, the Gram matrix has a decay rate of O(k^{-(n+3)/n}) for the eigenvalues. There should be about O( u^{-f(n)}) frequencies. A conjecture is that the limit of f(n) is a constant, which is independent of dimensions. [solved]
Although it is very straightforward to understand the hardness approximating the oscillatory function with networks, it seems the geometric optics is an exception since the phase function should satisfy certain Hamilton-Jacobi equations or similar, and that may not require the phase to be highly oscillatory at all. Therefore, many inverse problems based on the orthogonal relation can be solved in a network form through a search in geometric optics forms.
The network training by Adam cannot be accelerated by the usual acceleration algorithms, such as Aitken or usual epsilon methods, part of the reason comes from the roughness of the path. It is not yet known whether the acceleration is guaranteed for other algorithms with smoother paths.
A rigorous proof for the multi-component multi-layer NN, especially in Fourier activation.