Making Sense of Vision and Touch:

Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks