Kalman Filter
This is an over simplified tutorial for explaining the idea of Kalman filter with only one variable. Two useful links:
https://simondlevy.github.io/ekf-tutorial/
https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/
Conclusion first:
Kalman filter in this simple tutorial uses two main information for estimation:
The original prediction function based on law of motion or any physical rules.
This is for deriving the next state from the current state deterministically assuming no other factors / noises. This is the perfect scenario. For example, next location can be derived from previous location + speed * travel duration.
The actual observations.
As observations have noise, Kalman Filter uses average noise and prediction error to calculate a Kalman Gain, which determines how much gain from an observation is passed to the estimate. A portion of estimate from original prediction, and a portion of estimate from the observation, are combined into the final estimate. The Kalman gain is simply a trade-off factor determining the portion.
In a nutshell, Kalman Filter is kind of a weighted average of the prediction by law of motion, and the observation. The weight is derived from noise and error levels.
Caveat: this is only my understanding
Predict current state
x(k) is the current state at time k
x(k) can be predicted from x(k-1) at previous step using a linear calculation
x(k) = a * x(k-1)
For example, a plane is landing and the altitude x(k) = 0.75 * x(k-1) assuming it's descending linearly. The altitude is 75% of the altitude from the previous time step. This is part for predicting where the x(k) should be based on physical law / rules. The Kalman filter will later further update the prediction using the concept of Kalman Gain and the known observations
In theory the state x(k) can be estimated from the observation z(k) with the noise v(k) removed.
x(k) = z(k) - v(k)
However, we don't know the individual noise v(k) for the observation z(k) at time k. It's by definition unpredictable. So forget about the noise for now.
Estimate current state from observation
Fortunately, Kalman had the insight that we can estimate the current state x^(k) from current observation z(k) and the previous estimate state x^(k-1). The ^ denotes an estimate state instead of an actual state.
x^(k) = x^(k-1) + g(k) * (z(k) - x^(k-1))
The idea is that the difference between current observation z(k) and previous estimate x^(k-1) provides extra information, e.g. the change in terms of altitude from k-1 to k. But how much the difference contributes to the estimate x^(k) depends on a trade-off factor g(k) which is called Kalman Gain.
Kalman Gain
The gain g(k) is subject to the observation noise. If the noise is very low, the gain should be large (--> 1). If the noise is very high, the gain should be small (--> 0). Therefore, if noise is too high, Kalman gain is zero, then the observation is useless. It's better off estimating x^(k) same as x^(k-1).
x^(k) = x^(k-1) + 0 * (z(k) - x^(k-1))
= x^(k-1)
If noise is none, Kalman gain is one, then the previous estimate is useless because the observation is very reliable. Simply use the current observation z(k) as the estimate x^(k)
x^(k) = x^(k-1) + 1 * (z(k) - x^(k-1))
= z(k)
How to calculate Kalman Gain
The individual noise for an observation z(k) is unknown, but the average noise for the sensor that takes the measurement is usually known, e.g. the published accuracy from manufacturer. Let's say the average noise of the sensor is r
g(k) = p(k-1) / (p(k-1) + r)
Here it introduces another factor prediction error p(k), which is the difference between estimated/predicted value and the true value.
If the prediction error is zero, then the gain is 0 / (0+r) = 0. The estimate has no error so no need for the observation, i.e. x^(k) = x^(k-1) + 0 * (z(k) - x^(k-1)) = x^(k-1)
If the prediction error is one, then the gain is 1/ (1+r). Now if the average noise r is 0, i.e. the sensor is very precise, then the gain is 1/(1+0) = 1, which means the observation is reliable.
x^(k) = x^(k-1) + 1 * (z(k) - x^(k-1)) = z(k)
If the noise the very large, 1/ (1+r) tends to be very small, so the gain is small, and therefore
x^(k) tends to be ~ x^(k-1) + 0 * (z(k) - x^(k-1)) = x^(k-1)
Technically the average noise r is the variance (square std) of the noise. The prediction error p(k) is the "covariance" of the estimation process at step k, i.e. how much the estimates deviates from the actuals. It seems to be that the prediction error is related to the diagonal elements (variances) of the covariance matrix, while the non-diagonal elements (covariances) of the covariance matrix indicates only if pairs of the variables move together. Only the diagonal elements (variances) are related to the uncertainty / prediction error. Covariance is not directly related to uncertainty.
How to calculate prediction error
The prediction error at step k depends on the Kalman Gain calculated from the prediction error at step k-1 and the average sensor noise.
p(k) = (1 - g(k)) * p(k-1)
Why? Recall the estimate using Kalman gain:
x^(k) = x^(k-1) + g(k) * (z(k) - x^(k-1))
= (1 - g(k)) * x^(k-1) + g(k) * z (k)
The prediction error is propagated from x^(k-1 to x^(k) with a factor 1 - g(k), therefore :
p(k) = (1 - g(k)) * p(k-1)
When the Kalman gain g(k) is zero, no information from the observation, then the prediction error remain the same. IF the Kalman gain g(k) is one, the estimate is purely based on the observation and thus the prediction error is zero.
Two estimate equations
Now we have two equations for estimating the state.
The original equation derived from the law of motion or the physical mechanism.
x(k) = a * x(k-1)
The equation based on Kalman gain and sensor observations.
x^(k) = x^(k-1) + g(k) * (z(k) - x^(k-1))
We need both of the equations to estimate the state, based on different kinds of information. The original equation represents a PREDICTION about what the state should be as per physical rules. The second equation represents an UPDATE to the estimate based on an observation.
Therefore, we turn equation 1 to an estimate function:
x^(k) = a * x^(k-1)
When the state is multiplied by a factor/matrix, the prediction error / covariance matrix of the state variables is:
p(k) = a * p(k-1) * a
You will learn later in the matrix form, this is equivalent to matrix A * P * A
Finally the Karman Filter algorithm for single variable linear prediction:
Step 1. Prediction:
1.1. Apply the physical rules. Estimate the current state based on previous state and the rules. The rules can be more complex than the "a * x", and in some form of function.
x^(k) = a * x^(k-1)
1.2. Calculate the prediction error. When started, the error can default to 1, and is updated as the estimate goes on.
p(k) = a * p(k-1) * a
Step 2. Update:
2.1 Calculate the Kalman Gain. When started, this is zero. The p(k) is calculated in the previous prediction, so it's actually the initial prediction error at step k. Remember this is the update on the current step k.
g(k) = p(k) / (p(k) + r)
2.2 Update the estimate on top of the current prediction x^(k).
x^(k) = x^(k) + g(k) * (z(k) - x^(k))
2.3 Update the prediction error, again based on the current prediction error at step k.
p(k) = (1 - g(k)) * p(k)
Appendix:
Covariance / Prediction error
The covariance between two variables X and Y, Cov(X, Y), can be calculated by taking the expected value, or mean, E of the product of two values: the deviation of X from its mean μX and the deviation of Y from its mean μY. That is,
Cov(X, Y) = E[(X − μX)(Y − μY)]
The covariance can be also expressed as the expected value of the variables’ product minus the product of each variable’s expected value:
Cov(X, Y) = E(XY) − E(X)E(Y)
Covariance is intrinsically related to correlation, another measure of the relationship between two variables.
covariance may not be the most effective tool for conveying information about relationships between two variables
Python Code
Below is the python script that implements this simple Kalman Filter algorithm. The output is the estimate for each time step.
import pandas as pd
#sample data
xk_calculations = [1000,750,563,422,316,237,178,133,100,75] #calculated by x(k) = 0.75 * x(k-1), with 10 timesteps
zk_observations = [926,814,608,361,357,168,39,309,177,-10] #the observations for all the time steps
xhk = [] #the estimated states for all time steps
pk = [] #the prediction error for all time steps
gk = [] #the karlman gain for all time steps
#initialization
_xhk = zk_observations[0] #the initial estimated state = the observation value
xhk.append(_xhk)
_pk = 1 #the prediction error defaults to 1 at start, step 1
pk.append(_pk)
gk = [None] #no karlman gain at start, step 1
a = 0.75 # x(k) = a * x(k-1)
r = 200 # the average sensor noise of the observations
for i in range(len(xk_calculations)-1):
_xk = xk_calculations[i] #the calculation from previous step
_zk = zk_observations[i] #the observation from previous step
###### prediction ######
_xhk = a * _xhk #predict the current state
_pk = a * _pk * a #calcualte prediction error
_gk = _pk/ (_pk + r) #calcuclate the karlman gain
###### update ######
_xhk = _xhk + _gk * (_zk - _xhk) #update the current state based on karlman gain
_pk = (1 - _gk) * _pk #update the prediction error based on karlman gain
#record the estimate state, prediction error, and karlman gain for current step
xhk.append(_xhk)
pk.append(_pk)
gk.append(_gk)
df = pd.DataFrame([xk_calculations, zk_observations, pk, xhk, gk], columns=[str(i) for i in range(len(xk_calculations))])
df