Kalman Filter

Smriti Gupta, Ophir Gal

Kalman filtering is an algorithm for estimating the true value of some variable of interest and is more accurate then the value produced by a series of measurements alone. It tends to converge onto the true value more quickly than simply taking the mean of the measurements. It is an iterative process relying on a series of noisy measurements over time, where the estimate at each time step relies solely on that of the previous time step and the current measurement/observation. It is called a filter since it essentially filters out the noise. The filter is named after Rudolf Kalman, one of its prominent developers.

Kalman filtering is basically a two-step process: prediction and update. The algorithm gets initialized with the first estimate (usually some educated guess) and predicts the next estimate by comparing some predicted value with the sensor's (noisy) measurement/observation. These two values are weighted and combined to yield an updated estimate where the weights are determined by the percentage of uncertainty.

Animation of the Kalman Filter in action: the dotted red line represents the noisy measurements and the green curve is the filtered estimates (the horizontal axis represents time).

Formal Definition

  • X, the Mean State Vector, contains the information about the quantities the measurement depends on, e.g. position and velocity in the case of autonomous vehicles. The physical quantities are described by gaussian distribution with mean x.

  • P, the State Covariance Matrix, contains information about the uncertainty in the quantities on which measurement depends.

  • k is the number of time steps.

  • X k+1|k is the Prediction Step - you receive sensor measurements to update your belief and the algorithm predicts the next measurement at time k+1.

  • X k+1|k is the Update Step - After prediction by the algorithm, sensor measurements are used to update the physical quantities.

A block diagram describing two iterations of the Kalman Filter process.

The Dynamic Model:

Xk = FkXk-1 + BkUk + Vk
where Xk is the estimate of our true state at time step k.

  • Xk-1 is the estimate of the state at time step k-1.

  • Fk is the state-transition model.

  • Hk is the observation model.

  • Qk is the covariance of the process noise.

  • Rk is the covariance of the observation noise.

  • Bk is the control input matrix.

  • Uk is the control variable.

  • Vk is the noise/error/uncertainty in the predicted state.

Prediction :

Predicted state estimate:

X' = FX+V ,

when you know X (initial measurements) you can predict X' using this equation.

Increase in uncertainty (process noise):

P’ = FPFT + Q,

the process noise, refers to uncertainty in the prediction, which can be represented by a gaussian distribution with a mean of zero.


After getting sensor data,

Measurement Update:

Y = Z - H X' ,

where H is the measurement function that maps the predicted state to the measurement space.
For example, LIDAR only measures position but Kalman Filter models both position and velocity, so H is used to map the values from the measurement domain to the prediction domain.

S = H P’ HT + R
where S is called the
innovation covariance.

Kalman Filter Gain:

K = P’HTS-1,

it combines uncertainty of where we think the correct measurement is, P' with an uncertainty of our measurement R.

If R>>p', sensor measurements are uncertain, and the Kalman filter will give more weight to prediction.
If R<<P', more weight is given to the sensor's measurement.
Measurement noise represents the uncertainty of the measurements.

New updated state:

X = X’+ K Y

Update covariance estimate:

P = ( I - K H) P’

State transition Function -

  • B is the control input matrix.

  • U is the control Vector.

  • B*U represents and updates the state of our object (e.g. the position of a car) due to some internal force (e.g. of the motor of the car) when we know how much acceleration and deceleration is going to happen over time.

  • V is the random noise.

  • B*U = 0 when internal forces are not able to model.

Q update (Process Covariance matrix):

To model the stochastic part of the state transition matrix we have

P’ = FPFT + Q,

The state vector only tracks physical quantities like velocity and position in autonomous vehicle. The Q matrix includes the time, dt, to account for the fact that as more time passes we become more uncertain about the velocity and position. So as dt increases we need to add more uncertainty to the state covariance matrix P. We can also model acceleration as random noise.

For illustration, let's look at a diagram describing the use of Kalman Filter with LIDAR and RADAR data to estimate position and velocity:

As you can see in the diagram, covariance matrix and state matrices are generated after the first measurement and prediction are made. The state is updated as new measurements come in.


Vanilla Kalman filters assume Gaussian distributions and linear functions underlie the data. Most real world problems have a non-linear relationship between the update and prediction phases as they involve angles and other factors. Vanilla Kalman filters cannot be used for non-linear functions. When you feed a gaussian distribution with a non-linear function the result is a non-gaussian distribution.

Key Results

It has been shown that Kalman Filters are optimal linear filters (i.e. minimize the expected mean squared error between the real value and estimated value) under the following assumptions: (1) the true system generating the values is exactly described by the model used in the filter, and (2) the noise in the system is uncorrelated, and the noise covariances are known. Autocovariance least-squares is one of several methods that have been developed to estimate covariances.

The performance of the filter can be evaluated once you obtain estimates of the covariance. The optimality or performance of the filter is indicated by the "whiteness" of the noise of the output prediction error, which can be assessed in a number of ways. There are also methods for measuring the performance when the noise terms are non-Gaussian distributed using probability inequalities or large-sample theory [6], [7].

Connection to Decision-Making in Robotics

Kalman filtering has many applications in technology. One of the main applications is in the guidance, navigation and control of vehicles, especially aircraft or space vehicles (where navigation systems have difficulty verifying the location using waypoints). In fact, It was used in the Ranger, Mariner, and Apollo missions of the 1960s. In particular, the on-board computer that guided the moon-landing of the Apollo 11 lunar module used a Kalman filter. It is also widely applied in signal processing and econometrics.

The filter is also used in robotics motion planning and sensor fusion in localization. For example, you may want to navigate your autonomous vehicle according to a series of noisy data from LIDAR and RADAR sensors which consists of errors or inaccurate measurements that may lead to collisions. Using Kalman filters, we can predict better estimates using a relatively small number of measurements over time instead of using the last measurement alone. It helps to solve MDP problems.

Non-Linear Variants

Extended Kalman Filter

For non-linear functions, an extended version of Kalman filter is used. It linearizes non-linear functions by employing a multivariate Taylor series approximation.

Linearizing the H function

One can linearize the H function by taking a Taylor series approximation of the non-linear function.
This approximation is tangent to H at the mean location of the original gaussian.

A non-linear function with its linearized approximation.

Source: http://cognitoware.com/tutorials/ExtendedKalmanDifferentialDrive.htm

First Order Taylor Expansion

  • H(x) ≈ H(a) + DH(a)(x−a)

  • H(a), a non-linear function at 'a' - best prediction of predicted estimation.

  • DH(a)(x−a), extrapolation around 'a'.

  • H(x) = arc(tan(x)), projects the predicted state x onto the measurement space.

  • DH(a) - Jacobian matrix


For Extended Kalman Filter, the function does not need to be linear, but it should be differentiable.

xk= = fkxk-1+Wk

y = Hx’ + v

The Wk and v are multivariate gaussian noises with mean 0 and covariance Q and R respectively. The Jacobian is used to combine measurements with observations as they cannot be directly combined. At each time step, Jacobian matrix is created to produce estimates.


Predicted state estimate , x' = f(x,u) , when you know x you can predict x' using this equation.
The increase in uncertainty P’ = FPFT + Q represents the uncertainty in the prediction step which can be represented by gaussian distribution with mean zero.


After getting sensor data we preform the measurement update:

y = z - h(x),
where h is the measurement function that maps the predicted state into measurement space.

S = H P’ HT + R

Kalman Filter Gain:

K = P’HTS-1, it combines the uncertainty of where we think the correct measurement is, P', with the uncertainty of our measurement R.

If R>>p', sensor measurements are uncertain then Kalman filter will give more weight to prediction.
If R<<P', more weight will be given to the sensor's measurement. Measurement noise refers to the uncertainty in the measurements.

New updated state:

x = x'+ K y

Update covariance estimate:

P = ( I - K H) P’

where the state transition matrix and observation matrix are Jacobians F = f'(x) and H = h'(x).

As you can see in the figure, first Extended Kalman filter (EKF) matrices were generated for the input data. In the prediction step, new EKF matrices, covariance matrix, and state transition matrix were generated to make the first prediction. For the RADAR data, Jacobian matrices are first computed as they produce non-linear data while for LIDAR, vanilla Kalman filter works. The state is updated with new measurements.


Extended Kalman filters may sometimes linearize very fast and hence fail to produce a good estimate. They may produce a highly deviated covariance matrix than the covariance matrix formed by true measurement. For further optimization, particle filters and Monte Carlo methods are commonly used.

Unscented Kalman Filter

This is an alternative solution to the non-linear functions of Extended Kalman Filter (EKF). When the f and h functions are highly non-linear, EKF does not work as the covariance (or uncertainty) will propagate, leading to poor performance.

It uses a minimal set of sigma points which are randomly sampled points around the mean. They are further used to predict a new mean and covariance estimate. This is a deterministic process which removes the Jacobian calculation process and produces the true mean through an unscented transformation. Here, we need to find a normal distribution that is as close to the true distribution as possible that has a common mean and covariance matrix.

Sigma Points

It is very easy to transform points of the state space with a non-linear transformation. Sigma points are chosen to be around the mean state and in a certain relation to the variance of every dimension of the state.


The algorithm follows two steps as before: prediction and update.

Process model:

xk|k = f(xk,νκ)

Prediction - done in three steps: generating sigma points, predicting new sigma points and predicting new mean and covariance.

Update - done in two steps: measurement prediction and a state update.

ήx - dimension of state vector, number of physical quantities.

ήσ - number of sigma points = 2*ήx+1 with first point to be mean of the distribution.

Diagram describing the Unscented Kalman Filter algorithm.

(chi) χκ|κ= [xk|k xk|k+ sqrt((λ+ηx)Pk|k) xk|k- sqrt((λ+ηx)Pk|k)] - generation of sigma points

xk|k - the mean state estimate

xk|k+ sqrt((λ+ηx)Pk|k) - 2 sigma points

xk|k- sqrt((λ+ηx)Pk|k) - 2 sigma points in opposite direction.

λ - design parameter - it determines the location of sigma points. A large value of λ moves sigma points away from the mean state and a smaller value moves them towards the mean state.

νκ - process noise vector, independent of change in time. To find its value we only need to know the stochastic properties of noise.

[ xk|k |νκ ]- Augmented state vector - formed after combining the state vector and process noise vector, also called augmented covariance matrix. For this matrix as well - sigma points are calculated in the same way.

Predicted mean:

xk+1|k = ε(i=0-na) (ωi* χk+1|k,i)

Predicted Covariance:

Pk+1|k = ε(i=0-2*na) [ωi*( χk+1|k,i-xk+1|k )*( χk+1|k,i-xk+1|k )T]

ωi =λ/(λ+na), for i=0

ωi =1/2*(λ+na), for i=1,2,3....na)


Measurement model:

Zk+1=h(xK+1) + wk+1

This is the same process as prediction. Instead of generating sigma points we use points from prediction. Here, measurement noise wk+1 have additive effect so we do not need to calculate the augmented matrix. We will transform sigma points into measurement step and calculate mean and covariance.

Sk+1|k = ε(i=0-2*na) [ωi*( Zk+1|k,i-zk+1|k )*( Zk+1|k,i-zk+1|k )T]

Update Step:

For this we need actual measurement from sensor

Kalman Gain:

Kk+1|k =Tk+1|kSk+1|k-1

State Update:

xk+1|k =xk+1|k+1+Kk+1|k *( Zk+1-zk+1|k )

Covariance Matrix update :

Pk+1|k =Pk+1|k --Kk+1|k*Pk+1|k *.KTk+1|k

Cross Correlation between sigma points in measurement space and state space:

Tk+1|k =ε(i=0-2*na) [ωi*( χk+1|k,i-xk+1|k )*( Zk+1|k,i-zk+1|k )T]

( χk+1|k,i-xk+1|k ) = predicted sigma points in state space.

Zk+1|k,i-zk+1|k ) = predicted sigma points in measurement space.

Important Applications

  1. Object tracking problems in computer vision like Lucas-Kannade tracker, visual odometry, and lane detection for autonomous vehicles.

  2. Sensor fusion: experiments and robots involving use of multiple sensors (like LIDAR, RADAR, IMU's etc.) use Kalman filtering to produce better estimates.

  3. Satellite Navigation.

  4. Weather forecasting.

  5. Solving continuous state space or discrete state space MDPs.

Open Research Questions

Some open problems are: optimization of delay tolerance, adaptive Kalman filters, devising solutions for different type of MDPs, and more. To improve the performance for highly non-linear and more realistic models - some other methods like particle filter and Monte Carlo methods are used.