Stride is a way we control how much we slide the filter across the image. Below are two examples of different stride sizes. In the left animation, a stride of size 1 is used for a 4x4 image and 3x3 filter to produce a 2x2 output. On the right, the same filter with stride of size 2 is applied to a 5x5 image to also produce a 2x2 output. Obviously, even though the dimensions of the output are the same, the computed values will be very different.
A larger stride may be desirable to quickly shrink the dimension of the input if saving computation resources is of concern. However, this comes at the expense of ignoring more spatial information.
You may have noticed that the width and height shrink after passing through a convolutional layer. How do we ensure that this does not happen? We use zero-padding, which is the process of padding the input with some trivial value (i.e. zero) around the border. This ensures the outputs of convolution operations are not affected by the padding (i.e. element-wise multiplication with zero) while preserving the dimension of the input.