IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks