After more than a hundred hours of work over three weeks in August 2025, my short (8 minute) movie is now finished and viewable HERE. Quality subtitles in English and French are available. How can it take a hundred hours to generate eight minutes of video, or 1.5 hours for each 8 second clip? Find out by spending an hour or three working through this 12 page 'making of'. This page describes the creation of the first 8 second clip and other pages describe the work behind later clips. You can find links to those other pages in the menu top left or, for your convenience, right here:
1 The witch's discontent
2-3 The witch prepares the potion, Looking down into the cauldron
4-5 King Brian addresses his daughter, Princess Caroline replies to the king
6-8 The princess's gift, Sir Peter's pledge, Sir Peter mounts his horse
9-10 The clash, King Brian keeps his promise
11-12 The lifting of the mask, The throwing of the potion
13-18 The witch presents a puzzle, The King calls for his best knight, Sir Nigel arrives, Sir Nigel speaks to King Brian, Sir Nigel decrypts the cryptogram, The quest for the Holy Snail begins
19-21 Sir Nigel stops at an inn, Sir Nigel meets Madame Julia, Sir Nigel meets Harriet
22-25 Drunk Sir Nigel is robbed, Sir Nigel's horse has gone, A night in the forest, Over the mountains
26-29 Borrowing a boat, Crossing the river, The mushroom forest, The Holy Snail
30-34 Sir Nigel's Request, Slime For Sale, But I have no money..., The Holy Snail responds
35-40 V.A.T., The proposition, Sir Nigel bites the Mushroom, The trip, John Cheese, Epilogue
In July 2025 I started work on a project to make a very short little video (maybe about 4 minutes?) with a proper storyline and including the names of people in my family. I will also try to make my characters also resemble those family members. Ideally I would like my family members to speak their lines, perhaps using clones of their voices, but I haven't arranged that with them so it's probably not going to happen. My short video will probably have the theme of a quest, set in medieval times, inspired a bit by Monty Python and the Holy Grail.
In the opening scene a witch is preparing a potion. I cast sister Fiona in this role because she published a photo of herself made up for El Dia de los Muertos so she pretty much volunteered for this role:
The Google Veo 3 AI video generator now has the option of making videos from a starting image (plus a prompt) as opposed to just a prompt. Since one has great control over the starting image, it's a major improvement. It means I have control over the appearance of the characters, so I should be able to make them look like family members. And it should also mean that the last frame of the generated video can be used as the first frame of the next video. Those clips can then be combined into one, thereby breaking the eight second limit for continuous clips.
I gave the photo above and the following prompt to Grok and ChatGPT and asked them to generate a suitable first frame:
Can you make this person into a witch in a witch's cavern? The cavern is dimly lit by burning torches. The witch is carrying a dead rat and some herbs towards a large bubbling cauldron from which steam is rising. 16:9 format. The atmosphere is menacing.
Here is the Grok result (1080x1233), followed by the ChatGPT result (1536x1024):
The left image, made by Grok, has kept Fiona's face very unchanged, which is good, but the aspect ratio is not at all 16:9, there is no large cauldron, the dead rat looks more like a live mouse (and it has an extra ear!) and the atmosphere is not sinister enough.
The ChatGPT image has a 3:2 aspect ratio, much closer to 16:9, good atmosphere but much less recognizable Fiona traits. Also the resolution of the image is much higher. I don't like that the image is quite symmetric (boring) but when this is used to start the video I can ask the camera to move so as to break the symmetry. ChatGPT has taken the initiative to add a big witch's hat, which is fine.
Pollo.ai has an option to use AI to extend an image and thus change its aspect ratio. I used it to extend the mousey image above into 16:9 format and got this pleasing result below. I also used my Photos app to remove the mouse's extra ear.
I decided that the above image would be used as the last frame of the first 8 second clip in my movie. For the first frame, I would use a similar image but with the arms down and a bit more of a close up. I made that in ChatGPT by asking it to modify the rather square image above and then extended the result in Pollo to get this:
I was at last ready to generate the first 8 seconds of my video, or was I? I realized my free Google Vertex AI account cannot do images to video, only prompt to video, so I signed up for a Google AI Pro account (about €22 a month, first month free) which does have that feature. I uploaded the image above that was supposed to be the first frame and... disaster... I got the following message:
WTF? I can use the image to video feature to make a movie only if my movie contains no people??? This is madness. Is it possible that this restriction does not apply in the US? I set a VPN (using the free VPN Windscribe, connected to LA) and managed to connect to flow.google to have another go, though passing through a VPN may slow down my workflow. This time I am allowed to upload my two images and I add the following prompt:
A witch walks forwards as the camera tracks her. She says angrily “So King Brian wants to apply a tax to black magic, does he? He shall be punished! My potion will make his daughter fall asleep forever!” While she talks there is thunder in the distance. As she finishes talking she holds up a rat and some herbs.
When I hit the 'generate video' button I get the message that I cannot use Veo 3 in this configuration (providing my own images) and have to use Veo 2 instead. But Veo 2 does not generate audio!! I go ahead anyway, keeping the voice instructions in the prompt. I have to choose between Veo 2 Quality and Veo 2 Fast. The former is higher quality, more time-consuming to generate and more expensive in credits. I choose Veo 2 Fast. After 2 minutes the generation is 99% complete. After 10 minutes the generation is still 99% complete and then I get a message that the generation has failed, with no reason given. Google charges me the full 10 credits anyway (the Google AI Pro subscription gives me 1000 credits per month). This is getting very frustrating. I try again after removing the audio instructions from the prompt to leave:
The atmosphere is sinister for the cavern is lit by the flickering light of burning torches. A witch walks forwards as the camera tracks her. She holds up a rat and some herbs. Hyper-realistic imagery. No subtitles.
This time the generation is more or less successful. I upscale the video from 720p to 1080p before downloading it, which costs no credits, just some time.
But there is no sound and there is a discontinuity in the background. Maybe I can solve the background problem by making her turn as she walks, or cutting to a different room? I try using the same two images and this prompt:
The atmosphere is sinister for the cavern is lit by the flickering light of burning torches. A witch walks forwards, turning 90 degrees to her right, as the camera tracks her. She holds up a rat and some herbs. Hyper-realistic imagery. No subtitles.
After about 20 minutes, Veo 2 'Fast' tells me the generation has failed again. I make a last attempt, this time providing the final frame but not the first frame, thinking that that means that the beginning of the video is going to be a less good likeness of Fiona. I revert to the prompt:
The atmosphere is sinister for the cavern is lit by the flickering light of burning torches. A witch walks forwards as the camera tracks her. She holds up a rat and some herbs. Hyper-realistic imagery.
No, that doesn't work either - Veo can't generate backwards from a final image, only forwards from a starting image.
Is it possible to take a silent video like this and then make the character speak, using AI to animate the mouth? Maybe - I'll have to look into that. So much work just to get 8 seconds of video!
On the next day, I have the idea that maybe the reason I couldn't use Veo 3 (which can generate audio) is that I included both a starting image and a final image - maybe with only a starting image I can use it? I tried that, adding this prompt:
A witch walks forwards as the camera tracks her. She is holding a dead mouse and some herbs. She says angrily “So King Brian wants to apply a tax to black magic, does he? He shall be punished! My potion will make his daughter fall asleep forever!” While she talks there is thunder in the distance.
The generation is refused, with the message saying something about protecting minors. Maybe minors must be protected from seeing a dead mouse? I try again after removing the word 'dead'. Veo 3 doesn"t seem to react. I try again - no reaction. I try again, using an even softer prompt:
The woman walks forwards as the camera tracks her. She says angrily “So King Brian wants to apply a tax to black magic, does he? He shall be punished! My potion will make his daughter fall asleep forever!” While she talks there is thunder in the distance.
I get this message:
An hour later, I notice that two of the generations that I thought had failed had actually been processed at last. Fiona has found her voice! Except it's not hers of course. If my AI victims are willing to let me clone their voices then I can actually modify the voice in the video to resemble theirs. Here are the instructions for that, if you're willing. Without using that trick it's likely that the same character will have a different voice from one clip to the next. Here is the better of the two talking videos:
As you can see, there are still a couple pf problems in addition to the voice. The intonation of the words "does he?" isn't quite right. I asked for the sound of distant thunder and Veo has made lightning flash inside the cavern! Also, the witch is not walking even though I asked her to. Maybe I shouldn't expect witches to do what they're told.
So far I have used my own prompts but many users recommend that the prompt should be optimized for Veo 3 using ChatGPT. For the time being though I'll just try with this prompt:
The woman walks forwards as the camera tracks her. She says angrily “So King Brian wants to apply a tax to black magic does he? He shall be punished! My potion will make his daughter fall asleep forever!” As she walks she shows the mouse to the camera for a closer view. A rat runs across the floor behind her. Hyper-realistic imagery.
Another failed generation. How can it fail when very similar prompts have (sometimes) been successful? What is the problem? Are the servers overloaded? Is it because I'm using a VPN? Would I be more successful if I used prompts only rather than image plus prompt? I tried again with exactly the same prompt. Another fail. Each time Veo 3 churns for 15 minutes before declaring failure, so this is a very slow and frustrating process.
I shorten the prompt to this and try again:
The woman walks forwards as the camera tracks her. She says angrily “So King Brian wants to apply a tax to black magic does he? He shall be punished! My potion will make his daughter fall asleep forever!”
This time the prompt is rejected as possibly breaching child protection policies.
What can possibly be wrong with it? Are children such snowflakes that they shouldn't hear the words "black magic", "punished" or "angrily"? They all were accepted before.
I ask ChatGPT to optimise the prompt for me. Here is the conversation:
Can you optimise this prompt for Veo 3? The woman walks forwards as the camera tracks her. She says angrily “So King Brian wants to apply a tax to black magic does he? He shall be punished! My potion will make his daughter fall asleep forever!” As she walks she shows the mouse to the camera for a closer view. A rat runs across the floor behind her. Hyper-realistic imagery.
ChatGPT said:
Here’s an optimised version of your prompt tailored for Veo 3 (OpenAI’s video generation model), using clear visual cues, emotional tone, and cinematic direction for hyper-realistic results:
Optimised Prompt for Veo 3:
A hyper-realistic scene. A woman walks forward with determination as the camera tracks her smoothly in a medium close-up. Her face is tense with anger. She speaks with fury: “So King Brian wants to apply a tax to black magic, does he? He shall be punished! My potion will make his daughter fall asleep forever!” As she walks, she lifts the small mouse toward the camera, holding it up for a closer view. In the background, a rat darts quickly across the floor. The lighting is moody and cinematic, with deep shadows and sharp detail.
Nigel said:
Your prompt gets this response from Veo 3: This prompt might violate our policies about generating harmful content related to minors. Please try a different prompt or send feedback.
ChatGPT said:
Thanks for the heads-up — Veo 3 is highly sensitive to anything that could be interpreted as harm toward children, even indirectly. The phrase “his daughter fall asleep forever” is likely being interpreted as suggesting harm to a minor, even though it references a fantasy trope.
Here’s a revised version that keeps the tone and intention intact but avoids any red-flag language, especially around children:
Revised Prompt for Veo 3 (policy-safe):
A hyper-realistic scene. A woman walks forward with intensity as the camera tracks her in a smooth, cinematic medium shot. Her expression is fierce with anger. She declares: “So King Brian wants to apply a tax to black magic, does he? He shall regret it! My potion will bring his court to its knees!” As she walks, she raises a small mouse toward the camera, revealing it clearly. Behind her, a rat scurries quickly across the floor. The atmosphere is dark and dramatic, with moody lighting and sharp, detailed imagery.
I try with this slightly different prompt:
A hyper-realistic scene. A woman walks forward with intensity as the camera tracks her in a smooth, cinematic medium shot. Her expression is fierce with anger. She declares: “So King Brian wants to apply a tax to black magic, does he? He shall regret it! My potion will make him think again!” As she walks, she raises a small mouse toward the camera, revealing it clearly. Behind her, a rat scurries quickly across the floor. The atmosphere is dark and dramatic, with moody lighting and sharp, detailed imagery.
Another fail. By now I am tearing my hair out. I switch to library view in Flow to count how many failures and how many successes there have been and am very surprised to see that in the library... there are videos that I had thought had failed!!! But I'm only able to download just one of the recent videos in 1080p resolution, the one with the prompt developed by ChatGPT:
In my time-consuming efforts to generate an acceptable eight second clip to begin my little movie I have had 8 'successes' (including just one that is very usable) and 7 failures. I've burnt through 130 credits of the 1000 that I get each month, which makes me think that Google has charged me for all 13 attempts, including the failures, even though they say they don't charge for failures. Comments in forums say those credits should be reimbursed to me, in which case my first clip will have cost 80 credits, which is worth about 80/1000 x €20 = €1.60, which is fine with me. The learning curve was steep but I'm confident that from now on generating usable clips will be much faster and cheaper...
It occurs to me that it might be extremely easy to make the videos in more than one language - all that should be necessary is to mention the target language in the prompt like this:
A hyper-realistic scene. A woman walks forward with intensity as the camera tracks her in a smooth, cinematic medium shot. Her expression is fierce with anger. She declares in French “So King Brian wants to apply a tax to black magic, does he? He shall regret it! My potion will make him think again!” As she walks, she raises a small mouse toward the camera, revealing it clearly. Behind her, a rat scurries quickly across the floor. The atmosphere is dark and dramatic, with moody lighting and sharp, detailed imagery.
Note that I'm not talking about dubbing here - I'm talking about the characters actually speaking the alternate language, with correct lip sync. It used to be necessary to do it as above, giving the text in English and then telling Veo to speak it in a different language (so Veo has to do the translation). It may be that it is now possible for the prompt to include the actual text in the target language, which might avoid translation mistakes by Veo. But in my case Veo does a great job:
I'm starting to get the impression I will be using mainly these sites and apps for this project:
ChatGPT for generating and modifying the images. Also for optimizing prompts for Veo 3 (update: I later experimented with Midjourney also for image generation)
Pollo.ai to expand the images from 3:2 into 6:9 format
Sometimes my photo editor, Affinity Photo or Microsoft Photos, to correct images, e.g. removing the extra ear from the mouse
Google Flow (at labs.google) to generate the AI clips with the Veo 3 model and upscale them from 720p to 1080p.
Elevenlabs.com to make voice clones and modify voices in the videos if necessary.
CapCut to assemble the clips into a movie, add credits etc. Note that I have found a legal way to legally make use of the many AI functions in CapCut that would normally require a paid subscription.
YouTube to publish clips and the finished video.
Google Sites to log my AI journey.
Update: I started using Midjourney as an alternative to ChatGPT at some point to generate images.
But remember that AI image and video generation is progressing very fast and by the time you read this my workflow and this selection of apps may no longer be optimal.
Since this page is getting rather long, I've decided that I will probably make one page for each couple of clips. Click HERE for the development of clips 2 and 3, where the witch is preparing her potion.