I made a movie using AI! My short (8 minute) movie 'The Quest' is now finished and viewable HERE. Quality subtitles in English and French are available. How can it take a hundred hours to generate eight minutes of video, or 1.5 hours for each 8 second clip? Find out by spending an hour or three working through the 12 page 'making of'.
Of course you're aware of the massive impact AI is having on many aspects of our lives, sometimes in a positive way and sometimes not. The recent rapid progress in AI generation of images and videos has impressed me so much that I've been experimenting with both since early 2025. This site presents some of those experiments, some of which were successful and some not.
In early 2025 there was a trend for turning people into blister-packed dolls. I took the photo of my talented dancer great-niece Erin (at left), and used ChatGPT to make her into a packaged doll. My first attempt, centre, was rather glum and the mask looked like something out of Silence of the Lambs so I continued chatting with ChatPGT until I got the nice result at right. The whole process is free - ask me if you want to try it.
I also made one of myself. The resemblance is stunning, don't you think?
Videos of babies in podcast studios were a also thing in early 2025. To make mine, I first asked ChatGPT to make this photo of me into a photo of a baby in a podcast studio, then I recorded my voice speaking a few words. Then I uploaded the photo and sound file to Hedra.com and asked that AI to make the photo into a video with the baby me speaking those words. Of course, I have old photos of myself as a baby that I could have used, but it was interesting to see whether ChatGPT could do a decent job guessing what I once looked like. It probably did - I certainly had a chubby face back then.
The first video refers to the family gathering that took place in Buxton, capital of the Peak District in England, in June 2025. The lip sync worked well. I was a bit disappointed that baby me is not wearing the headphones, so I did another video with a prompt specially requesting that they should be on...
The second video is a birthday message for my niece Eva. Videos made on Hedra.com are limited to 10 seconds in length, which seems to be a common restriction for AI videos, though sometimes one can make a 10 second video and then extend it little by little until it lasts minutes. Hedra is one of the many AI models that is focused on 'talking head' videos and isn't much good for anything else.
In June 2025 I made another baby podcast video for the family gathering in Buxton, England. It was a limerick tribute to sister Fiona, who experienced two setbacks during her week-long visit to Costa Rica: first she was stung by a stingray and then she had a nightmarish return to the UK after her flight from Houston was cancelled. My nephew Adam kindly posed for the photo and read the limerick. Strangely, ChatGPT removed his glasses when babyfying him.
In May 2025 it suddenly became possible to do life-like AI video creations thanks to Google's Veo 3 AI. What's more, unlike most AI video generators, Veo 3 can produce a video with sound effects and a lip-synced voice in a single pass. But Veo 3 is expensive to use (it could cost more than a dollar per second of video) so I looked around through the dozens of alternatives and determined that Runway (https://runwayml.com) and Kling (www.klingai.com/global) are the next best options. It's also possible to sign up for a package that could include Veo 3, such as Pollo AI, which gives you the option of choosing between a number of AI generators instead of committing to just one.
I'm thinking it would be nice to make a longish (5 minute?) video one day in the spirit of Monty Python and the Holy Grail so I signed up to Runway and asked it to generate an image based on this prompt:
King Henry 8 is speaking to his court in the great hall of this castle. There are many people and the atmosphere is boisterous . A raging log fire warms the space.
I like the result but it's not perfect: the people are glum, not boisterous, and they're not looking at their king. The composition is quite symmetrical, which is dull. And there is no log fire. The image has an aspect ratio of 16:9 (1280x720), as I requested outside the prompt.
With a free account on Runway you get 125 credits and making this image cost 5 credits so I could do 25 images like this free. But the real interest is to make videos, not still images. The free version of Runway can't make a video just from a written prompt like most generators can - it needs a seed image which will be used as the first frame of the video. Maybe I will indeed use this still image above as the seed for a video clip...
But why use Runway to make an AI image when ChatGPT can do it too? Here's what ChatGPT generated with the same prompt. I forgot to specify the aspect ratio as 16:9 as I had for Runway, and ChatGPT gave me an image with 3:2 aspect ratio (dimensions 1536x1024).
That image is entirely faithful to the prompt, including the log fire. But it seems too dark, too brown, very flat and the boisterous characters are not very life-like. But at least they are 'boisterous'. Maybe the other models do not understand that word?
While we're at it, let's see what Grok (Elon Musk's AI) and Google Gemini can do:
Grok gave me two images and I chose this one. It's by far the most photo-realistic image and the character is just as you would expect Henry 8 to look (although in fairness to Runway, he must have been young once). But you can hardly see the boisterous crowd or the great hall, and the 'log fire' looks quite weird. The image is 3:4 (720x960).
The Gemini image has the most impressive great hall and a good log fire but the image isn't very photo-realistic and the king is too small relative to the image. Also, the crowd is not boisterous. The image is square (2048x2048).
The four images are quite different and it's fascinating to compare them. Of course if you were to give the same prompt to four human artists or photographers they would also produce very different images (and it would take them hours rather than seconds to do so, and probably wouldn't be free). Which image do you prefer? With all these models except perhaps Runway it would be possible to have a conversation to refine the image into something closer to what you want, as I did with 'packaged Erin'. Given that that is the case, I would go for Grok's photo-realistic image and try to refine that. Note that I did not specify in the prompt that I wanted a photo-realistic image, as I should have done.
Runway can also make an image based on a combination of a reference image and a prompt. I uploaded this image of my sister in law, Shusheel, (below left) as a reference image and added a prompt in the form of a verse from a poem about Shusheel written by sister Julia at last year's family gathering:
With yoga you bend
on bike cover miles
always bring a kind word
and beautiful smiles
This is an extremely difficult prompt for an AI to understand and make into an image, Here below is the result.
The result was quite disappointing, except that I hadn't expected a good result in the first place. There is no resemblance to Shusheel and I suppose the gibberish text appeared because of the vague mention of 'word' in the prompt. Also, I didn't ask for two separate images. Prompting for AI is very much an art, and the quality of prompting will make or break the result. This image generation cost 10 credits due to the use of a reference image.
I'm expecting most of the videos that I make with AI to be disappointing and unusable. But the technology is progressing fast - it will be a different story and much cheaper in six months time.
In March 2025 my aunt Daphne sadly passed away and I attended the funeral in England. After the funeral there was a reception and this photo of me with my brothers Peter and Simon was photobombed by a stranger. I asked ChatPGT to remove the photobomber. Here is the original image and the image generated by ChatGPT.
The photobomber has gone, but so have myself and my brothers, replaced by these similar-looking but rather creepy characters (especially the one on the left). Why did it make me cross my arms in the opposite way? And why did that plant in the picture at left lose all its leaves?
Of course, it would also be possible to combine the two images in a photo editor and have the best of both worlds.
This demo video below (not made by me) was created to demonstrate the power of Veo 3. The video, based on a short text prompt, demonstrates near-perfect realism, and the lip syncing of the voice is also excellent. But each time AI generates a video from the same prompt the result will be different - it's possible the same prompt was run a hundred times before this impressive result was achieved. Here is the prompt that was used:
A medium shot frames an old sailor, his knitted blue sailor hat casting a shadow over his eyes, a thick grey beard obscuring his chin. He holds his pipe in one hand, gesturing with it towards the churning, grey sea beyond the ship's railing. "This ocean, it's a force, a wild, untamed might. And she commands your awe, with every breaking light"
I hereby present below my first proper attempt at making an AI video, using Veo 3. It imagines protests against tourism in the town of Buxton where I and my family will stay in a couple of days time. Here is the prompt I used. Notice that it includes an instruction to include sound (chanting):
It is raining. An angry crowd is gathered in the street, chanting FREE BUXTON and waving placards. One placard reads WARDLINGS GO HOME, another reads NO TOURISTS and another reads FREE BUXTON. In the background a couple of cars are burning, sending smoke into the air. The crowd is menacingly advancing towards the camera.
I'm very happy with the result. There must be a hundred characters in this scene, all will dynamic movements, making it a big challenge for the AI. At first glance it seems quite convincing but look again and you will see distorted faces, overlapping bodies etc. With fewer characters and less movement Veo 3 would be better able to generate the life-like video for which it is famous. The sound works well except that it doesn't match perfectly the prompt I used which asked for only the words "Free Buxton" to be chanted.
I used Google Veo 3 as part of a package called Pollo AI (pollo.ai/home). I signed up for a Pollo plan that costs €27 euros a month for which I get 800 credits a month. The 8 second video cost 330 credits so you could say my video cost 27 x 330/800 = €11, so more than €1 per second. Sounds expensive? Imagine how much it would cost to make that video with real actors. At that rate, using Veo 3, a five minute movie would cost 300/8 x 11 = €412! In fact it would cost much more because apparently most attempts can be expected to be unusable. But my video was very short and I got lucky - the video was very usable, with no unwanted subtitles.
I do want to make a longer video, maybe five minutes, but I'll have to use an inferior video generator or find a cheaper way to use Veo 3.
For my second attempt at making a 'Buxton riot' video I used the Pollo 1.6 model that is available within Pollo. I used this prompt (it excludes the instruction to make the people chant because I assume this generator cannot do sound):
It is raining. An angry crowd is gathered in the street, waving placards. One placard reads WARDLINGS GO HOME, another reads NO TOURISTS and another reads FREE BUXTON. In the background a couple of cars are burning, sending smoke into the air. The crowd is menacingly advancing towards the camera. Cinematic hyper-realism.
The generator refused to make the video on the basis that my prompt is 'sensitive or harmful'. That's a shame because it's a relatively cheap generator, 20 credits for 10 seconds as opposed to 330 credits for 8 seconds in Veo 3. It's common for AI video generation models to do a lot of censoring of anything violent or portraying celebrities.
For my third attempt at making a 'Buxton riot' video I used the Kling 2.0 model that is available within Pollo. This is meant to be one of the very best models, almost on a par with Veo 3, but it is not capable of generating audio in a single pass like Veo 3 can. I used the same prompt as above and got this result:
This is very disappointing to me. The motions seem natural except for a large sign that is trying to take off, but there are relatively few people and much less dynamism and anger than in the Veo 3 version. At least the camera is moving. I didn't ask for cars driving by. I see flames but not cars burning. There is a strange ugly dark spire in the sky in front of odd-looking clouds. I had assumed that 'street' implies an urban context but it has given me a rather rural road instead. And above all, of course, all the text is gibberish, which is a very common problem with AI video, but less so with Veo 3. At least it is more obvious than in the Veo video that it is raining. It's surprising to see how different the videos can be based on almost identical prompts.
Kling 2.0 is almost as expensive as Veo 3 in Pollo credits. It costs 200 credits (about €7) for a 10 second video with no sound. Kling 2.0 let's you choose between 'creative' and 'adherence to prompt'. I chose 75% adherence. This video took much longer than the Veo 3 video to process, perhaps 10 minutes. Video typically has 24 frames per second so in order to make a 10 second video the AI has to generate 240 images. AI video creation uses huge amounts of computing power - hence the cost.
In Kling's favor, it does have one valuable feature that is currently missing from Veo 3: it can make a video using an image that you provide as the first frame. That gives you much greater control over the video and makes it possible to use consistent characters from scene to scene
When using image-to-video it is still normal to include a prompt to specify the motions of the characters and the camera.
I wanted to also use Runway to make a video from the same prompt but their best model, Gen 4, cannot currently (June 2025) make a video from a text prompt alone, only from an image accompanied by a text prompt. There are older Runway models that can do text-to-video but they are not avalable in the free plan that I am currently using.
I thought I'd make an image to celebrate the start of Wardling Week so I put this prompt into ChatGPT, Grok and Gemini.
A starfighter is taking off from a launch page on the surface of Mars while a few astronauts in spacesuits watch from the ground. On the side of the starfighter are the words WARDLINGS 1 written in big letters.
Who won this time? Gemini did with this image. The Grok and ChatGPT images looked lame, but again I forgot to specify 'photo-realistic'.
But what is that weird spike on the top of the starfighter? I continued my conversation with Gemini to try to get rid of it, but I prefer the original weird image.
When I said "Can you make the same image without the tall spike object on top of the starfighter? " it gave me this. But what happened to the launch pad? I tried again with "Can you make an image using this vehicle but with flames shooting out downwards, with no supporting structure, and a launch pad on the ground? " and got the third image. Not bad, but I think the first image is the most interesting. You?
On returning from the family gathering in Buxton in June 2025 and during preparations for next year's trip, I wanted to make a clip about the process of choosing the destination. I tried to make a still image to use as a seed for a video clip but wasn't very pleased with the result so I decided to just use a prompt to make the clip with Veo 3. Here is the result - the story behind this clip is here.
Here's a clip I made in Veo 3 for my friend Barbara's birthday. The prompt was
35mm film. We are inside an old British submarine. The atmosphere is cramped and claustrophobic. We see several submariners in uniform who are looking towards the camera. One of the submariners is closer than the others and he is better dressed than the others. He says "Captain Nigel and all the crew say...". Then he and all the others shout loudly and warmly "Happy Birthday, Barbara!" Then they look at one another, laughing. Background sounds: the hum of machinery and the ping of the submarine.
It came out okay but the symmetrical format is dull, Captain Nigel is less handsome than he should be and the way they suddenly stop laughing is perturbing (I could edit it out). Also, Veo 3 has again given me black bars top and bottom (I cropped them out) whereas it is supposed to give me 16:9 format. The video is quite different to what I had imagined. I had imagined everyone seated at their workstations and a more close-up view of Captain Nigel. My fault for not specifying that. Lastly, the camera is static until it moves back a bit at the end. If I had asked for a selfie (or selfie-stick) video by the captain there would have been more movement, more dynamism, and the captain would automatically have been closer.
The above video looks rather soft. Veo 3 makes videos with 720p resolution (1280x720 pixels, aspect ratio 16:9) except that this clip has black bands top and bottom, lowering the effective resolution. I tried doubling the resolution using the Pollo.ai video upscaler and got the video below, now 2560x1440. Can you see any difference? I can, but you need to be viewing it full-screen on a screen that can handle that increased resolution of course. Pollo.ai video upscaling isn't free, but it uses very few credits so is cheap and worthwhile. On the other hand, there are probably other AI models that can do upscaling for free. CapCut perhaps?
To continue exploring this site, use the menu at top-left. I recommend the Veo 3 section, and in particular my 'major project' to use Veo 3 to create a three minute video called The Quest. How can the making of a three minute video be a 'major project'? Well, there are so many variables that you can potentially control, and so much trial and error involved, especially in trying to make the characters look consistent from scene to scene, that this project will probably take three weeks to complete!