MViTac:

Self-Supervised Visual-Tactile Representation Learning via Multimodal Contrastive Training

Vedant Dave*

Fotios Lygerakis*

Elmar Rueckert

*Equal Contribution

Abstract

This paper addresses the challenge of integrating visual and tactile sensory data within robotic systems. We propose MViTac, a novel self-supervised framework leveraging both intra-modal and inter-modal contrastive learning. MViTac outperforms existing self-supervised and select supervised methods on material property identification and grasp prediction tasks.

Introduction

Methodology

MViTac Loss

Intra-modal loss

Inter-modal loss

Findings

Material property identification

Robot Grasping Success

Discussion

Cite

@misc{dave2024multimodal,

  title={Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training}, 

  author={Vedant Dave and Fotios Lygerakis and Elmar Rueckert},

  year={2024},

  eprint={2401.12024},

  archivePrefix={arXiv},

  primaryClass={cs.RO}

}

Contact us: 

For any question you can contact us at: vedant.dave@unileoben.ac.at