Past work in robotic grasping often relied purely on vision; However, human grasping heavily rely also on touch. While vision provides coarse-grained information about object location, shape, and size, touch provides complementary fine-grained cues about contact forces, textures, local shape around contact points, and deformability, all of which are critical for evaluating an ongoing grasp. In this paper, we ask: how can a robot exploit tactile information together with vision to iteratively and quickly improve its grasp? We present an end-to-end approach for learning greedy regrasping policies from raw visuo-tactile data. This approach is based on an action-conditional deep model that given visuo-tactile information about the current grasp and a candidate grasp adjustment, predict the success probability of the next grasp. Our approach requires neither calibration of the tactile sensors, nor any analytical modeling of contact forces, thus significantly reducing the engineering effort required to obtain efficient grasping policies. We trained our visuo-tactile model with the data collected over 6,450 grasping trials on a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger. An extensive experimental validation against a variety of baselines show that our approach significantly improves a robot's ability to (i) correctly predict the outcomes of pre-recorded grasps, (ii) successfully grasp an object with the least number of attempts, and (iii) use minimal force while maintaining performance.