Learning to Jointly Understand Visual and Tactile Signals