AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos