The Curious Case of Kalman Filters
What are Kalman Filters?
Kalman filter finds the most optimum averaging factor for each consequent state. Also somehow remembers a little bit about the past states. It performs a joint probability distribution over the variables for each timeframe. The algorithm uses the new mean and new variance for every step in order to calculate the uncertainty and tries to provide accurate measurements for each timeframe of the measurement update (sensing/prediction) and the motion update(moving). The algorithm also uses other inaccuracies and statistical noise in order to represent the initial uncertainty.
Purpose of Kalman Filters:
- Transform data input from various sensors like LiDAR and Radar trackers into a usable form.
- To Calculate inferring velocity.
- Reduce measurement error(noise) of the target’s position and velocity.
- Predict the future state of the target using previous state estimate and new data.
- Lightweight, Robust and expandable algorithm.
- Estimates a continuous state and as a result, Kalman Filters happens to give us an Uni-modal distribution.
Working of Kalman Filters
A Kalman filter gives us a mathematical way to infer velocity from only a set of measured locations. So, here I’m going to create a 1D Kalman filter that takes in positions, takes into account uncertainty, and estimates where future locations might be and the velocity of an object. Further, if we want to understand how Kalman filter works, we first need to know a bit about Gaussians which represents the Uni-modal distribution in the Kalman filters.
Gaussian is a continuous function over the space of location and the area underneath sums up to 1.
The Gaussian is characterized by two parameters, the mean, often abbreviated with the Greek letter Mu(μ), and the width of the Gaussian often called the variance i.e. Sigma square ( σ²). So, our job in common phases is to maintain a Mu(μ) and a Sigma square( σ²) as our best estimate of the location of the object we are trying to find. Also, remember that the larger the width, the more uncertainty it possesses.
The diagram in the above image represents the mean(μ) and variance(σ²) of the gaussian. The taller the mean(μ), the chances of the object present at that position are higher. conversely, if the variance(σ²) is larger, i.e. wider the distribution, higher the uncertainty of that object; that might be positioned at any place within the gaussian. And as far as the formula is concerned, it is an exponential of a quadratic function where we take the exponent of the expression. The quadratic difference of our query point x, relative to the mean(μ), divided by sigma square(σ²), multiplied by -(1/2). Now if x = μ, then the numerator becomes 0, and if x of 0, which is 1. It turns out we have to normalize this by a constant, 1 over the square root of 2 Pi(π) sigma square(σ²).
Gaussians are exponential function characterized by a given mean(μ), which defines the location of the peak of a Gaussian curve, and a variance(σ²) which defines the width/spread of the curve. All Gaussian are:
- they have one peak, which is also referred to as a “unimodal” distribution, and they have an exponential drop off on either side of that peak.
More on Variance:
The variance is a measure of Gaussian spread; larger variances correspond to shorter Gaussians. Variance is also a measure of certainty; if you are trying to find something like the location of a car with the most certainty, you’ll want a Gaussian whose mean is the location of the car and with the smallest uncertainty/spread.
Let’s write a Gaussian function:
Shifting the Mean:
In Kalman filters, we iterate through measurement (measurement update) which uses Bayes rule, which is nothing else but a product or multiplication and through motion update(prediction) in which we use total probability which is a convolution or simply an addition.
In order to understand the cycle, let’s assume that we are localizing a vehicle and we have a prior distribution(blue gaussian); it is a very wide Gaussian with the mean. and now let’s say we get a measurement(orange gaussian) that tells us something about the localization of vehicle. This is an example in our prior we were fairly uncertain about the location but the measurement told us quite a bit as to where the vehicle is.
Note: In the above diagram, Mu(μ) is the prior mean and Nu(v) is the new measurement mean.
The final mean gets shifted which is in between the two old means, the mean of the prior, and the mean of the measurement. It’s slightly further on the measurement side because the measurement was more certain as to where the vehicle is than prior. The more certain we are, the more we pull the mean on the direction of the certain answer.
Where is the new peak?
The resulting Gaussian is more certain than the two-component Gaussians i.e. the covariance is smaller than either of the two covariances in the installation. Intuitively speaking, this is the case because we actually gain information. The two Gaussians together with high information content in either Gaussian installation.
- Suppose we multiply two Gaussians, as in Bayes rule, a prior and a measurement probability. The prior has a mean of Mu(μ) and a variance of Sigma square(σ²), and the measurement has a mean of Nu(v) and covariance of r-square(r²).
- Then, the new mean, Mu prime(μ′), is the weighted sum of the old means. The Mu(μ) is weighted by r-square(r²), Nu(v) is weighted by Sigma square(σ²), normalized by the sum of the weighting factors. The new variance term would be Sigma square prime(σ²′).
- Clearly, the prior Gaussian has a much higher uncertainty, therefore, Sigma square(σ²) is larger and that means the Nu(v) is weighted much much larger than the Mu(μ). So, the mean will be closer to the Nu(v)than the Mu(μ). Interestingly enough, the variance term is unaffected by the actual means, it just uses the previous variances.
A new Mean(μ′) is your old Mean Mu(μ) plus the motion often called u. So, if you move over 10 meters in the x-direction, this will be 10 meters and you knew Sigma square prime(σ²′) is your old Sigma squared(σ²) plus the variance(r²) of the motion Gaussian. This is all you need to know, it’s just an addition. The resulting Gaussian in the prediction step just adds these two things up, mu(μ) plus u and sigma squared(σ²) plus (r²).
The Filter Pipeline:
- So now let’s put everything together. Let’s write the main program that takes these 2 functions, update and predict, and feeds into a sequence of measurements and motions. In the example I’ve chosen, measurements = 5., 6., 7., 9., 10. and motions are 1., 1., 2., 1., 1. This all would work out really well if the initial estimate was 5, but we’re setting it to 0 with a very large uncertainty of 10,000.
- Let’s assume the measurement uncertainty is constant 4, and the motion uncertainty is constant 2. When you run this, your first estimate for the position should basically become 5–4.99, and the reason is your initial uncertainty is so large, the estimate is dominated by the first measurement. Your uncertainty shrinks to 3.99, which is slightly better than the measurement uncertainty. You then predict that you add 1, but the uncertainty increases to 5.99, which is the motion uncertainty of 2.
- You update again based on measurement 6, you get your estimate of 5.99, which is almost 6. You move 1 again. You measure 7. You move 2. You measure 9. You move 1. You measure 10, and you move a final 1. And outcomes as the final result, a prediction of 10.99 for the position, which is your 10 position moved by 1, and the uncertainty(residual uncertainty) of 4.
Output: when the initial uncertainty is high
Plotting the Gaussian:
How the Filter actually works:
Remember that measurement_sig(var2 or σ²) = 4 in measurement update & motion_sig(var2 or σ²) = 2 in motion update(prediction) will remain constant. Other than that each and every variable in the code will be updated at each timestep. for example, in the for loop of the filter pipeline, the mu and sig values will get updated in the measurement update and the new updated mu and sig values are then fed into the motion update(predict) function. then again when the new values are generated by the motion update(predict) function, those updated values are then fed into the measurement update(update) function and the cycle goes on till it calculates uncertainty for each and every timestep of the vehicle/object.
Feel free to mess around with the code and understand what is really happening when you change a particular variable’s value. Also, change the initial uncertainty to a very low value such as 0.0000000001 in order to get a clear understanding of How the Kalman Filter works!
Check out the code on Github.