Visually grounded few-shot word learning in low-resource settings