Decomposing Motion and Content for Natural Video Sequence Prediction