Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training