Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards