GAIA: Generating Task Instruction Aware Simulation Grounded in Real Contexts using Vision-Language Models