Mutual exclusivity (ME) in visually grounded speech models