One way of understanding how information is stored in the weighted sum is with <vector, scalar> associations. Where the vector acts as an input the the weighted sum and the output is the scalar sum value. Using a training algorithm it possible to find a weight vector that can:
Accurately (and with error reduction) map vectors to scalar values. (Under capacity case.)
Accurately (with no error correction) map vectors to scalar values. (Capacity case.)
Inexactly (but in a statistically significant way) map vectors to scalar values. (Over capacity case.)
If you store 1 <vector, scalar> association the weight vector aligns with chosen input vector. You can use the variance equation for linear combinations of random variables to deduce in the typical case there is a reduction in variation of the output scalar value compared to the (vector) input values.
If you store 2 <vector, scalar> associations most typically there now exists an angle between each input vector and the weight vector. The length of the weight vector must increase to continue to provide the required scalar outputs (since at 90 degrees a weighted sum a.k.a dot product gives only zero out). The variance equation indicates the input is more sensitive to noise.
The capacity is at most (unless the inputs are very aligned) the number of weight terms in the weighted sum.
Used over capacity exact recall is not expected, rather the scalar recall values become contaminated with Gaussian distributed noise (despite exact inputs.) The more over capacity the more noise.
Fortunately the weight vector magnitude does not increase in the over the capacity case. Input noise sensitivity seems to peak at exact capacity.
In the example code below you can train <vector, scalar> associations for a 3 term weighted sum.
The weight vector is the orange line with a large white dot.
The training vectors are shown as green lines. The target scalar value is plotted as a small-sized gray dot along the vector.
The gray dot isn't in reality orientated with the vector, it is simply placed there so that you know which vector it belongs to. And its value is just how far it is from the origin. If its value is negative, or very large it no longer appears on the visible vector line segment, it may give some appearance of being in outer space.
The mid-sized red dots are the actual scalar values recalled using the weighted sum which obviously become inexact if you try to store more than 3 <vector, scalar> associations.
After you store say 3 associations and then take 1 away. The solution is still correct but the weight vector is longer than necessary. However the training algorithm will not change it because it is correct.
You can experiment with weight decay to allow reduction in the magnitude of the weight vector.
Click on the image to position in 3D.
You can check the noise sensitivity of a weighted sum by probing it independent (identically distributed) random variables.