It is common practice to insert pooling layers between convolutional layers to reduce the number of parameters in the network. What is most commonly implemented are max-pooling layers. These are layers that apply a filter that outputs the maximum value in the receptive field. From Andrej Kaparthy's blurb on CNNs:
The pooling layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The depth dimension remains unchanged.
Another useful property of pooling is the stationarity property, which is the idea that features useful in one region are likely to be useful in other regions as well. Thus, pooling can summarize statistics about the image in various locations using, in this case, the maximum value.
Of course, they are other types of pooling layers such as average pooling and cases where pooling is completely discarded (see The All Convolutional Net). However, max-pooling is by far the most common operation in modern CNN models.