It is a known result by G. Cybenko that one can approximate any "reasonable" classification/ regression function f(x) using shallow (one layer) model with probably a huge amount of neurons:
here 𝛼j are coefficients, x - is vector of the inputs yj is the vector of numbers (weights), so that transpose(yj) x is the inner product, and 𝜽j is a number (base).