Miscellaneous‎ > ‎


Convolution is an operation which takes two functions as input, and produces a single function output (much like addition or multiplication of functions).  The method of combining these functions is defined as 

Where x, y both range over all of .  Here I will try and present convolution as a very convenient way of solving certain problems, and then abstracting from that; as a mathematical operation important in its own right.

We will start our investigation by looking at some waves.  Imagine you're standing on the edge of a pond, and you throw a small stone in.  The stone will cause a ripple to travel outwards across the surface:

What about if the small stone was heavier?  Well, intuitively we would expect a bigger ripple:

Or what about a lighter stone?

All three of the ripple patterns above share the same general shape, but differ only in their magnitude.  How do we go about expressing this observation mathematically?  First, let's abstract a bit: if we look at the impacting stone as our input, and the resulting ripple as our output, we then have a process to model:

We can express the dependence of the ripple's size on the weight of the stone by saying that the output scales linearly with the input:

This property lets us see that there is some "special" stone we could study: that of unit weight.  If we know the ripple caused by this stone, we can then find the ripple caused by any stone (lighter or heavier) by just scaling by the proper amount.  If we think of this stone as appropriately "small" (technically, a point), then mathematically we can model it by an impulse.  The wave caused by this unit weight pebble is called the impulse response, a relationship we can throw into symbols as follows:

Where   is the impulse and h symbolizes the ripple.  So far, by just knowing h we are able to find the output which results from an input of any magnitude.  What else can we do with it?  Well; a lot it turns out.  For one, so far we have been dropping our stones at the (mathematical) origin, but we intuitively expect the shape of our ripple to look the same no matter where we drop the rock.  We can express this by saying that if we translate the impulse, we translate the response:

What about if we drop two stones in at once, but at different locations?

The wave caused by two stones is just the sum of the waves produced by each stone.  In math-speak:

Together with the scaling property we had from above; this lets us say our ripples are linear.  To clear up notation from here on out; I'm going to replace the arrow with a function, w, so that instead of saying 

We will say

Let's say now that we have a handful of small pebbles of all different weights.  We throw them into the water and they spread out, impacting at all different locations.  How are we to model the complex ripple that will result?  Well, upon impact, the surface just experiences a sum of stones 

And each stone can be viewed as a scaled and translated version of our unit stone ().  This lets us say

Now, what does the wave that results look like?  We can just apply our "wave operator", w to the impact:

Which we can phrase more mathematically by

We are now in a position to use our previous knowledge of the linearity of our "wave operator" to write this more explicitly: we know the wave of a sum is the sum of waves, and the wave of a scalar multiple is the scaled wave of the impulse, so we can go ahead and express this symbolically

Now, we have expressed the resultant wave as just a scaled sum of various "unit waves", which we already have an expression for: h

Assuming we know the form of this function h then (the impulse response), we know the full form of our solution for the handful of pebbles:

What have we learned here?  Well, assuming we know what wave propagates from a unit impulse at the origin, we are able to figure out what the wave looks like for a collection of various impulses all over the place.  Given a difficult problem (what does the water's surface look like when you throw a handful of pebbles in), we were able to reduce it to a much simpler problem (what happens when you apply a unit impulse at the origin), and the answer to this question and from it construct the answer to our original problem.  

In fact, we can significantly expand on this observation; but it will be easier if we do it at a slightly higher level of abstraction.  What we have really been doing above is solving the problem of 

Where our evolution function w is both linear and translation-invariant (meaning we can break the input up as a sum of simpler parts if we'd like, and no matter where an impulse occurs the response looks the same).  But what if our input isn't a simple sum of impulses?  What if the input is a continuous function?  Such a case could easily arrive in our wave analogy, if instead of dropping a pointlike pebble, we instead dropped in an extended object, like a cardboard box or a rock.  If we would like to use the line of reasoning above, we need some way to break this complex input into impulses somehow:

If we can manage to do this, then we can solve our problem simply: we just need to apply w to each impulse individually, and sum the results!

To do this however, we are going to switch to a different physical analogy (don't worry, we will come back and solve the wave problem in a bit!).  The reason for our switch-up is just to simplify the problem at hand: with waves we have to worry about the time dependence of our response as it travels through space, which doesn't affect the nature of the problem at all.  After all, off in abstract-land all we are doing is dealing with problems of the form

Instead of water waves, lets say we are going to take a picture.  The setup is something like this: an object to be photographed, the camera lens system, and a piece of film.

The goal of our camera is to take light from the original object and deposit it on the lens; so we can think of our camera as a function from object to film.  If we call the object to be photographed the Source, the function to the image the Camera, and the film the Photograph, we can reduce this to a simpler schematic diagram.

If we shine a bright light at the camera, we expect a bright image; if we shine a dim light, we expect a dim image.  If we shine two lights, we expect two images.  So, we can say that our camera function is linear, much like the waves. We can symbolize this relationship

If our camera is in focus, we expect to see on the image exactly what what was around in the real world.  That is, we expect a 1-1 correspondance between the source plane and the photograph.

What if our camera is out of focus?  We know from experience that this means what appears to be a single point in the source (think a star) is a smudged blob on our photo.  Thus, we don't have a 1-1 correspondence anymore, single source points spread their influence to multiple image points:

To make things easier to keep track of, we will let h(x,y) stand for the blurry image of a star which was originally centered on the source screen (ie at (0,0)), after our crappy camera imaged it.  Since this star is rather point-like, we will treat it as an impulse and say

Or more succinctly

In exact analogy with our wave example, lets say now we take our camera and image the night sky.  The source plane (the cosmos) contains a multitude of stars, each of a different brightness.  What will our image look like?  Well, just knowing how a unit impulse at the origin is imaged, we can figure it out!  Viewing each star as a stretched and translated unit impulse; our image will be a collection of stretched and translated blurs.  We can decompose the night sky into a sum of these modified impulses, and then distribute our imaging function over the sum, blurring each star individually and adding the results.  

Symbolically say

Where mi is the brightness of each star, and (xi,yi) is its spatial location.  

Source Plane

If we want to determine what our photograph will look like, we can say

And again, we already know how C affects a unit impulse: it turns it into h!  This gives us our solution:


So far we have come up with two seemingly different problems; both which have a simple solution if we first understand how an "impulse" is propagated through them.  Now lets take a picture of a portion of the sky with even more stars in it: for example, this small globular cluster which orbits our home galaxy:

Source Plane

We can apply the exact same reasoning as above here: each star is like an impulse, so to see how each star will look on our final picture, we simply take that impulse, propagate it through the camera until it becomes our response function h, and then add all of them back together.  The resulting blurry photo looks like this:


It's time to make sure the mathematical formalism of this process really makes sense:  at the location of each impulse (star), we apply our "blur" function which models our camera's lack of focus.  This blur function spreads the light out from this original point source to a more extended region.  If our star is located on the source plane at point (xi, yi), we need to make sure that our final image of the star is centered on that location as well.  We defined our function h(x,y) to be the blur caused by a unit impulse at the origin, so we need to slide this impulse to occur at our new point (xi, yi) instead.  This is what the term 

says.  Its just a translated version of our blurred out impulse, originating from (xi, yi) instead of (0,0).  Now if the stars original brightness was not simply 1 but some other amount, we must scale the brightness of our blur accordingly.  Thus the image of a star of brightness mi located at (xi, yi) will be



So far, this has just been a more careful repetition of what we have done above.  What we are going to do now seems to be a simple change of symbols, but will actually give us some profound insight into convolution itself.  The number mi represents the brightness of our ith star, which is located at the point (xi, yi) on the source plane.  What this is really saying then is that the brightness of the source plane at location (xi, yi) is given by mi.  That is, we can define a function on the source plane which gives us the brightness at each point.  Lets call this function m(u,v).  Given a point (u,v) of our source plane (the night sky), m tells us what the brightness is there.  Of course, in our star examples so far most points of the plane have a brightness of zero (black sky).  But as we saw in the globular cluster, if we try to image a part of the sky with lots and lots of stars, some of them will seem to touch, and form extended areas of brightness.

In fact, in the limit lets say we find a point of the night sky where there are stars everywhere.  There are no dark points, every line of sight ends in a star.  Our brightness function m would be nonzero everywhere, and every single point of the sky would have an impulse (star) located at it.  Can we use what we have learned so far to figure out what our photograph of this area would look like?  Sure we can!  The only difference now is that we have a continuum of stars, instead of discrete points.  But, at each location (u,v) in the night sky, we have an impulse with brightness m(u,v).  We can represent this star by this brightness multiplied by an impulse function shifted to be located at (u,v):

This single point source dosen't propagate through our camera unchanged, however; it becomes blurred.  This blur is still centered around the original location of the star however, and scaled for brightness by m(u,v) so we can say the image of this particular star is

Just like in our discrete cases above, we now just need to sum up the images of each star to get the final image.  This sum needs to be taken over each (u,v) which has nonzero brightness, which in the present case is all of them!  We will need to perform a continuous sum then, over each position (u,v).

This last equation just says the image of our continuous field of stars is the blurred image of each star individually, all added back together.  However, this also happens to be a realization of the formula for convolution presented at the top of this post!  The resulting image of our star field is just the convolution of our brightness field with the blurred image of a unit star at the origin!

How about the image of a lizard?  Well, we were able to treat a continuous field of stars successfully by modeling it via a continuous "brightness" field.  What if we just view our lizard as a continuous field of brightness/color?  With the stars it was intuitive to say that each point was in reality an "impulse," which was the crux of our argument.  Does the same thing carry over here?  Is the image of a lizard really nothing more than a weighted collection of impulses?  Let's think for a second about a television set displaying our lizard on it's screen.  If you get really close to this image, you'll notice that its actually made out of teeny tiny pixels, its nothing more than a collection of tiny impulses!  If we shrink these pixels to zero size (as an ideal impulse is), then it seems we can really say that a lizard's image is nothing more than a collection of infinitly small pixels, or impulses!

Since we already know how our camera responds to a unit impulse, all we have to do to model the photo of our lizard with the blurry camera is to take this response, scale it by the brightness at each location, and add them all up.

(It's probably time for a new camera!)

The blurry image is then just the convolution of our original image of the lizard (the brightness field), and the response of the camera to a single impulse.

Alright then, back to the water waves.  What if we threw a larger object into the pond?  Instead of having a discrete set of impulses to start the water vibrating, we have a continuous field of applied pressure.  


Just like with the image of the lizard, we can look at this contiuous field as simply being a bunch of scaled impulses smushed up next to eachother.  To find the wave caused by this, we just need to find the wave caused by a single impulse, translate it, scale it, and add them all up


Again, we recognize this as a convolution of the wave caused by a single impulse and the pressure field of the extended object:

These two examples give us a bit of a feel for what convolution is.  If we know how a certain system reacts to a simple impulse, we can figure out how that system reacts to anything, by first breaking it down into impulses, sending each through individually, and adding them up at the end.  When we add them all up, we have to make sure to shift all the functions appropriately and scale them (so it matches up with the input) and taking care of this turns our sum into a convolution integral.  Convolution is just the continuous analog of the problem solving strategy "break it down into small parts, solve those, and put em back together at the end".

Convolution seems to be a quite common operation throughout mathematics though; so let's see if we can find other places where it arises to further broaden our intuition.  

Consider for a moment a metal rod which we have heated in some way.  If we let it sit out, it will obviously cool down, but how do we express this quantitatively?  Via a partial differential equation known aptly as the "heat equation", given the initial temperature of our rod (as a function of position), we can solve for its temperature at all future times.  

For a general initial temperature distribution this may be a hard thing to do.  So let's see if we can stick to the reasoning that's proved fruitful so far, and consider the effect one "impulse" of heat has on our rod.  The mathematical model for this is

The solution to this problem is called the "fundamental solution", and can be visualized as follows:

In this image, the red represents "hot" and the blue "cold".  At t=0, we apply an impulse of heat to our rod (think of touching a soldering iron to it), and as time progresses that heat spreads out and evens out, as we would expect it to.  We can alternatively choose to plot the heat distribution in a "standard" graph:

Where the vertical dimension gives the temperature at position x.

Solving the heat equation analytically for this particular initial condition, we can arrive at a closed form of this fundamental solution (here we will take the constant to be unity)

The most important part however is not the form of this solution, but rather the form of the problem:  the transformation of our initial impulse into a different function:

If we had decided to place the soldering iron at a different location than the origin, we would expect the same heat distribution to result, and so this process is translation invariant.  Had we placed two soldering irons, we would expect the result to be the sum of two dispersing heat waves, and if we had placed a hotter iron, we would expect a hotter rod; the heat equation is linear and translation invariant.  

In abstract-land, this problem is identical to both of the above.  We have some sort of a correspondence between an impulse and a function, and we can write this correspondence in a linear, translation-invariant manner.  Thus, we expect that if we heat the bar via some continuous distribution instead of an impulse, we can find the final solution to our problem by convoluting the initial temperature with the impulse response.  As an example lets say we heat the bar to the right of the origin, and cool it to the left so that the initial temperature distribution looks something like this:

Or, in symbols:

To solve for the temperature distribution caused by this initial condition, we will view f as being composed of a bunch of mini impulses, right next to eachother.  Feeding each impulse through the heat equation will give us a shifted and translated version of our fundamental solution, and adding them all back together will give us the answer we seek.  This is of course, just the convolution of the initial condition and the fundamental solution:

Since our initial condition is only nonzero over the finite range (-3,3), we can re-write this integral as

Performing this integral, we get

Which is much more easy to understand pictorally:

We can see qualitatively that the temperature distribution tends to "smooth out" as time progresses, much as we would expect.  This example provides some good mathematical justification for the use of convolution: in addition to it being intuitively simpler to break a problem into impulses and then just re-combine them, it's hard to see how to even come up with an analytic expression as complicated as the solution here if we had not first solved for the fundamental solution, and then gone used convolution formally to compute the desired answer.

There's one more type of problem where convolution is common in its solution:

How do we mathematically formalize the concept of an impulse?  We want this impulse to have a few specific properties

  • Be of unit "size"
  • Point-like
  • Centered at the origin

To satisfy the first property, we must first understand what we mean by the "size" of a function.  For our purposes we will say that size is equivalent to the area under that function; in symbols 

By being point-like and centered at the origin, we mean that our impulse is pretty much zero at all other points.  This fits well with our intuitive examples of impulses, the small pebbles and starlight.  How could we go about constructing some function like this? For a starting place, look at the normal curve:

By construction this curve has a total integral of 1, and it's roughly concentrated about the origin.  What can we do to make it more "point-like"?  The following animation shows a sequence of normal curves (each of total integral 1) with increasingly small standard deviations.

As the standard deviation decreases, the curve becomes more and more centered at the origin, which is precisely what we want.  It appears that if we shrink the standard deviation towards zero, the curve will collapse onto the origin into one single, infinitely thin spike.

Since all of the normal curves limiting towards this spike had area 1, we will say this spike has "area" 1 as well, it is the "unit" spike.  This guy will serve us perfectly as a unit impulse.

However, this isn't the only way to define our impulse.  What if instead of squishing normal curves towards the origin, we squished a unit square?  As its base gets smaller and smaller, it must grow taller and taller (to preserve the unit area inside of it) and again in the limit we will end up with a spike-like object centered on the origin, which we can say has unit area.  In fact, there are a bunch of different curves we could use in a limiting procedure like this, and our final notion of impulse is independent of that choice.