"Watch, Learn, Help: Proactive Robot Assistants using Vision-Language Models