More senses engaged while learning = better recall
In general, the more senses that we engage while learning new information, the more likely we are to be able to remember the information.
From an evolutionary standpoint, human brains evolved to identify visual patterns far earlier than they evolved to identify spoken words or to read text. Research confirms that using simple pictures when learning new vocabulary helps build recognition and memory connections faster than with spoken and written words alone.
For beginning English learners, use visual, auditory and text cues when teaching and practicing vocabulary as much as possible. When possible, add tactile cues by using real objects in an in-person class, or asking learners to find [an object] in their home in a Zoom class. Use "touch your ..." games when teaching body parts.
When using images, try to find visually engaging photos rather than cheesy clip art. My two favorite sources for visual images are Pexels.com and Pixabay.com; both offer images which are freely licensed for use without requiring source attribution.
For listening practice, Youglish.com allows you to quickly search and find videos that include any spoken phrase, often from Ted Talks or other presentations. I use audio snippets from Youglish to help students pick out vocabulary words in conversations at a normal speaking pace.
Here's an example of a slide that I use when practicing job title vocabulary. We read the four titles, look at the pictures, and match the pictures with the titles (there's an extra title as a distractor). Then I play each audio clip for the students to see which of the vocabulary words they could identify in each clip. The playback speed for the audio clips can be slowed down.
I use a lot of videos, mainly as homework before class to introduce a concept that we will be working on in class or as homework after class to reinforce the lesson.