Clips 11-12

Clip 11: The lifting of the mask

In this clip I want the Black Knight to lift his mask to reveal that 'he' is actually a woman. Worse than that - she is Fiona Finkeldink, the wicked witch of the East! I will take the photo of Fiona in makeup and ask ChatGPT to make a head and shoulders shot of her in black armor. The problem is that I want this image to be the closing image in the clip, not the opening image. I believe that Veo 3 cannot handle a closing image, only an opening image, and therefore the solution must be to start with the open mask, then have Fiona quickly close the mask and then open it again. I will discard the beginning of the clip where she closes the mask. Alternatively I could start with the open mask, have the witch close the mask, and then reverse the direction of the clip so that it plays backwards (any speech during the clip will have to play forwards in time, though). I give ChatGPT the photo of Fiona and this prompt:

Can you make a head and shoulders shot of this person wearing black armor? The mask of the armor should be open so we can see her face. In the background, the same out of focus crowd of medieval spectators. There are grey clouds. Hyper-realism, neutral white balance and normal brightness. 16:9 aspect ratio.

It's a beautiful image but of course I want her to be wearing an open helmet, ready to close, so I send:

Can you make a similar image but she should be wearing the helmet but with the mask open. The woman should look older and the image should be lighter.

I send a close up of the helmet from the image with the king and princess and this prompt:

Better, but the helmet must have a visor that can be closed, so please use this helmet as a model. The visor should be present but open in the image you make. Please make the image lighter.

I am shocked by ChatGPT's response. It says:

You've hit the Plus plan limit for image generation requests. You can create more images when the limit resets in 720 hours and 0 minutes.

720 hours is 30 days! I just signed up for this paid-for plan a couple of days ago and I've already reached my monthly limit? What monthly limit? I didn't think there was one.

I say:

I don't understand. You told me that the Plus plan limits are as follows: 'Up to 40 image generations every 3 hours. Each "generation" can include up to 4 images, so that's potentially 160 images every 3 hours. Limits refresh on a rolling basis, not at fixed times.' You did not mention that there is a monthly limit.

ChatGPT says:

You’re absolutely right to be confused — there is no monthly limit on the Plus plan. The “720 hours” message you just saw is a bug in how the limit message is being reported. Here’s what’s really going on:

Plus plan normally allows ~40 image generations every 3 hours (rolling reset).
If you hit that short-term cap, you just wait a few hours and it refreshes.
There is no monthly cap that would block you for 720 hours — that number is definitely incorrect.

It seems we’ve hit a temporary glitch in image generation access, not a true limit.

If you’d like, I can keep the helmet-adjusted knight image prompt prepared and we can try again as soon as the image tool is working again.

I say:

Yes, please!

24 hours later ChatGPT was still telling me that the glitch was unresolved so I tried working with the image of the open helmet that does not appear to have a visor, or perhaps just a small one.

Hand-held camera. The camera slowly zooms in towards her face. The woman quickly closes the visor of her helmet with her right hand. The helmet should hide her face except for her eyes, barely visible. The woman then moves her hand down, then brings it up and lifts the visor open. When she opens the visor we hear a man out of shot shout in astonishment with a noble British accent "Great Scot! It's Fiona, the witch!" There is light rain. Hyper-realism.

The result is funny, with that man appearing unasked for and making an unasked for exclamation. But the clip is completely unusable - I need a mask that completely hides her face. The 'Great Scot' speech should come when she opens the mask, not when she closes it.

Since ChatGPT is not available, I send Grok the two relevant images and ask it to do this:

Can you make an image similar to the image of the woman but replacing the helmet with a helmet like in the second image, with a big visor. The visor should be open, as in the image of the woman.

It confirms the instruction:

Would you like me to generate an image combining the woman from the first image with the helmet from the second image, featuring an open visor? Please confirm if you'd like to proceed.

This is the image it generates. A pathetic level of understanding. I really can't use a medieval motorcycle helmet in my movie.

Not being able to use ChatGPT is so frustrating. I give Google Gemini the profile image of the helmet and ask it what a front view would look like and it gives me this. The vertical slots aren't consistent with the image I gave it and it's too shiny and light, but it's not bad. But I really also need an image with the visor raised so I can add in Fiona's face.

In a last desperate attempt I do a Google search for helmets with open visors and this eventually leads me to an Amazon page advertising this helmet for just $80. (Maybe I'll offer myself this helmet when I finish this project, if I ever do.)

I fill in the breathing holes, darken the helmet, and spend an hour generating these two images using my Affinity Photo image editor (me, not AI!)

I don't think Veo 3 will let me use a starting and an ending image but Veo 2 can do that. Veo 2 can't do audio but Fiona doesn't say anything in this clip and I already have some audio from the previous clip that I can maybe use. I upload the two images to Veo and use this prompt:

Hand-held camera. The woman uses her right hand to slowly raise the visor of her helmet, then lowers her hand. There is light rain. Hyper-realistic imagery.

Here is the result. I've added (using CapCut) the soundtrack from the previous clip since this clip would otherwise be silent. I've shortened the end of the clip to get rid of the silly "oh" that was at the end of that previous clip:

There's an unasked-for person who walks across the background, perhaps not in medieval clothes, but I don't have time to fix that. Basta! It's the first time I've made a clip with both the starting and ending images set by me, so I'm happy that worked well. AT LAST I have a decent result for this clip!

Clip 12:The throwing of the potion

Now we want Fiona the witch to cast her spell by throwing her potion in the princess's face. I'll reuse the image from clip 10 of the witch facing the king and princess, but I'll have to raise Fiona's mask and add her face in profile, which will be difficult unless ChatGPT can help me. ChatGPT is working normally again - the solution was simply to log out and log back in to ChatGPT. But the images ChatGPT produces for me are all off the mark, like this one.

But perhaps I can use that profile of the witch? After half an hour of working with my photo editor I have this:

I crop that a bit and feed it to Veo with the prompt:

Hand-held camera. The person on the left says with an angry woman's voice “Your taxes are too steep so your daughter now must sleep.” Then, with her right hand, she throws a cup of liquid in the face of the woman at the right, who falls to the ground, asleep, while the man at the right reels in shock. The camera tracks the woman on the right as she falls to the ground. Hyper-realism.

This will be a challenging prompt for Veo! Here is the result:

Better than I expected, but the sudden appearance of the large cup is a bit surprising and the liquid is thrown at the king rather than the princess. I try again with a modified prompt;

Hand-held camera. The person on the left says with an angry woman's voice “Your taxes are too steep so your daughter now must sleep.” With her right hand, she throws a small cup of liquid into the face and onto the right arm of the woman at the center, who falls to the ground, asleep, while the man at the right reels in shock. The camera tracks the woman in the center as she falls to the ground where she lies still. Hyper-realism.

That's a little better though the liquid still isn't thrown straight at the princess and the princess doesn't end up lying on the ground. It'll have to do. Having said that, it's apparently possible to mark up images to help Veo understand. For example, one could say "her right hand should follow the red arrow as she throws the potion in the woman's face". I'll have to try the mark up option later.

UPDATE: In the end I have to use version 1 because I notice that in version 2 most of the witch's makeup is missing.

Click HERE for the next clips.

Page updated

Google Sites

Report abuse