I want to see how well Veo handles water, so I had originally imagined Sir Nigel crossing a rough lake or sea but how would he buy or rent a boat, given that he has been relieved of his purse? So I am resigned to the idea of him crossing a river instead. But wait! I have ideas that would allow him to use a boat after all. He could borrow one. But who would lend a valuable boat to the penniless tattered wreck that he has become? Or he could steal one. Would Sir Nigel ever steep so low as to steal anything? I give the disheveled reference image to Midjourney with this prompt:
We are in medieval times. The man in the reference image is approaching a small rowing boat that is moored next to a small wooden jetty on the edge of a large lake. the weather is poor and the water is choppy. In the distance we see a couple of fishermen sitting together on the stony beach. The lake is surrounded by forest and mountains. Hyper-realistic.
Midjourney seems to have forgotten that I set a 16:9 format, and has given me square instead. None of the images are usable anyway, largely because the clothes are wrong. I want Sir Nigel to be much closer to the camera, although I didn't specify that. I did specify 'choppy water' and in this, the best image, that's not what it has given me.
I try again with this prompt and try turning up the 'Omni strength' from 100 to 200 for better compliance with the Omni reference image of disheveled Sir Nigel.
We are in medieval times. The man in the reference image is approaching the camera and is approaching a small rowing boat that is moored next to a small wooden jetty on the edge of a large lake. The weather is poor and the water is choppy. The reference man is about 5 meters from us and in the distance we see a couple of fishermen sitting together mending fishing nets on the stony beach. The lake is surrounded by forest and hills. Photo-realistic.
This is the only image that is possibly usable. It's very photo-realistic but it doesn't show the fishermen, which is okay. In all the other images he is walking away from the boat, despite a clear prompt that he should be approaching the boat. But I'll have to ask ChatGPT again to shorten the coat into a shirt.
The problem is that the reference image doesn't show his legs, so I work with ChatGPT to get a full-length portrait of disheveled Sir Nigel and get this first image. The face is different so I use my photo editor to combine the top half of the original image with the new lower half to get the second image which resembles me better (I hope).
But ChatGPT doesn't do a good job of shortening the long coat by the lake into a tunic - it makes it into a T-shirt again. So I decide to delete the man completely and then put Sir Nigel back. I don't have a 'magic' eraser' option on my PC so I have to send the lake image to my phone and use the Pixel camera app to instantly erase the man by the lake:
After 20 minutes of editing, I manage to put this ChatGPT-generated Sir Nigel in the image, but he's facing the camera, not the boat, so maybe he should speak to the camera?
Some might wonder why Sir Nigel doesn't just walk around the lake, so let's call it a river instead.
I need to know whether I can use Veo 3 with people on other platforms, so, even though it's expensive (and quite likely to give an unacceptable result), I send the image to Pollo with this prompt:
The camera slowly zooms in on the person. We see and hear the waves crashing onto the shore. The man points at the camera and says firmly with a noble British accent. "I want you people to know that I'm just borrowing this boat." He shouts "Sir Nigel is not a thief!" then turns and walks towards the boat. Photo-realism.
That worked well, which is a relief. The waves are very realistic. Sir Nigel steps towards the camera, which wasn't asked for, but that's okay. Good to know that I can still generate Veo 3 videos with people and sound, but frustrating that I have to do that in Pollo where I have only enough credits for two more videos whereas on Google Flow I have enough credits to generate another 110 videos.
I suppose I now have to show Sir Nigel rowing the boat, which will be another good test of a dynamic motion, plus the challenge of getting the waves right. But I need something more than just 'rowing' to happen in this clip. Should he talk to the camera again? Should something jump out of the water? I decide to try to make him sing 'Rule Britannia' while we see a large shark's fin appear in front of the boat.
But first I have to generate the starting image. I send the full-length image of disheveled Sir Nigel and the image of the boat with no people to ChatGPT with the prompt:
Please put this reference man in the rowing boat that you see in the other uploaded image - he is rowing the boat across the river and we see him from the stern of the boat, so he is facing the camera. Photo-realistic.
Everything looks fine except that stupid Sir Nigel is trying to row the boat while it's still on the beach! I try again with :
He cannot row the boat while the boat is sitting on the beach! Please put the boat in the middle of the river and make the composition less symmetric by setting the camera somewhat off to one side. Also, make the image lighter.
Okay, so now we're on the water, but his face has 'drifted' away from a resemblance to Sir Nigel and the detailed structure of the boat has been smoothed out into something less interesting. I think the drifting is a typical AI phenomenon - it happens within clips too since each frame is derived from the previous one - it's the reason AI video generations are usually limited to just a few seconds. The waves have become less rough and, as in the previous image, the boat is pointed up the river rather than across it. The image is still symmetric. I decide to edit the first image to get rid of the beach:
I take the risk of sending the (expanded) image to Pollo with this prompt:
The camera tracks the man as he rows the boat across the river. The water is choppy and there is some wind noise. The man sings strongly, with a noble British accent and without any musical accompaniment "Rule Britannia, Britannia rules the waves. Britons never never never shall be slaves" In front of the boat we see a large shark's fin appear and slice through the surface. Hyper-realism.
Well, that's usable, although the shark's fin appearance is so disconnected from the rest that I'll have to edit it out. Veo obviously doesn't know the tune for this song, but at least Sir Nigel does sing, unlike the earlier clip where I tried unsuccessfully to make that happen. The wave action and the rowing are less vigorous than I wanted.
This should be fun! We are approaching the Holy Snail's domain at last, by passing through a magical mushroom forest. To get inspiration for this, I use the Explore tab in Midjourney, where I search for 'mushroom forest' and find images like these:
Which one do you prefer? I'm going to use the first one, but I'll keep the second one in mind for the actual meeting with the Holy Snail. I'm thinking that if I make an image of Sir Nigel walking through the forest seen from the back then there is a good chance that Google Flow won't block me. I send the first image and disheveled Sir Nigel to ChatGPT along with with this prompt:
Can you put the man in the uploaded image into the magical forest? He is walking away from us, so we only see his back. He is using a walking stick. Hyper-realism.
Yes, I know that the phrase 'hyper-realism' may not be appropriate in this case, but here we go. The image comes out fine (not shown) and I send it to Google Flow (not Pollo) with the complex and ambitious prompt:
The camera tracks the man as he walks forward through the magical mushroom forest. The huge mushrooms sway gently. He says "Something tells me I must be approaching the home of the Holy Snail, but where is he exactly? Maybe this rabbit can show me the way?" As he walks a rabbit appears and the man pauses and then continues walking, following the rabbit. There is the magical sound of tinkling bells and we hear whispering voices, though we cannot make out what is being said. There is also the sound of dripping water and we see water dripping from the vine hanging overhead. There is a slight echo. Hyper-realism.
Oops, I forgot to specify 'noble British accent' and this time I get an American voice (video not shown). That confirms that Veo has no memory from one clip to the next (unless you use the 'extend' feature in the scene builder?). There is no swaying, whispering, dripping water, tinkling bells or echo. Worst of all, the walking stick suddenly disappears halfway through the clip. I have plenty of credit left on Google Flow so I decide to run a similar prompt in Quality mode:
The camera tracks the man as he walks forward through the magical mushroom forest. The huge mushrooms sway gently. He says in a noble British accent "Something tells me I must be approaching the home of the Holy Snail, but where is he exactly? Maybe this rabbit can show me the way?" As he walks a small rabbit appears and the man looks down at it and then continues walking, following the rabbit. There is the magical sound of tinkling bells and we hear whispering voices, though we cannot make out what is being said. There is also the sound of dripping water and we see water dripping from the scenery overhead. There is a slight echo. Hyper-realism.
The generation fails. I try again. Success, sort of. But the clip (not shown) is unusable because instead of looking down, he looks right and his profile is nothing like Sir Nigel. Instead it is cartoon-like. Really disappointing to get that in this expensive 'quality mode'. The voice is now British but it's not the same voice we have heard before. It occurs to me that I could perhaps make a clone of the normal Veo-generated Sir Nigel voice in elevenlabs to use when Veo gives me the wrong voice. There is still no swaying, whispering, tinkling bells or echo. There is some water raining down, but no sound to match that. Maybe the AI is just too busy generating scenery as the man advances?
But wait! The 'failed generation' that made me start again in Quality mode is now showing up, and it's usable, so I probably did the last expensive and unusable generation for nothing. I've sometimes had 'failed generations' before that then appear after all, but I still haven't learnt my lesson.
UPDATE: As I finished this project I redid a few clips I was not satisfied with, including this one. I tried a new ChatGPT prompt to have Sir Nigel walking towards the camera with no walking stick:
Please take the man in this uploaded image and place him in this magical forest scene. He is walking towards the camera and at his feet just in front of him is a cute small white rabbit. Photo-realistic. 3:2 landscape format. The image should not be too dark.
By the way, from now on I will not include the phrase 'Photo-realistic. 3:2 landscape format. The image should be bright. Neutral white balance.' when I tell you the ChatGPT prompts I am using, so as not to bore you. But you should assume I am still always including those words unless I indicate otherwise.
The generation failed. I try again and it works. I uncrop the image in Runway and then notice something very perturbing: Runway is doing a very bad job with the uncropping, as you can see by comparing these two images which are details from the original ChatGPT image and the upcropped one:
Part of the problem is that Runway is producing images with lower resolution (1173x660) than the input image (1536x1024), whereas you might expect the opposite since it is supposed to be extending the original image on the sides. I haven't used Runway much for uncropping - I was using Pollo until I ran out of credit there. I hope Pollo was not also doing such a poor job.
I know ChatGPT currently can't generate images in 16:9 format, unfortunately, so I ask it to make a new 3:2 image with a shrunken version of the previous image at its centre and outpainted extensions on all sides.
Since you are not able to generate images in 16:9 format, I ask you to create an image in 3:2 landscape format which contains the image you just made for me at the centre, outpainted in all directions by about 15%. The original part of the image should be changed as little as possible, except in size. Photo-realism. A bright image.
That worked well enough although I should have said 20% rather than 15% and the face is so bad that I take the time to superpose the face from the 'disheveled Sir Nigel against a white background' image' which works well. I also crop to 16:9 of course. Then I send the image to Veo with the prompt:
The camera tracks the man as he walks forward through the magical mushroom forest, still surrounded by huge glowing mushrooms. The atmosphere is magical. The huge mushrooms sway gently. As he walks the small white rabbit runs along in front of him and the man looks down at it and continues walking, following the rabbit. He says in a noble British accent "Something tells me I must be approaching the home of the Holy Snail, but where is he exactly? Maybe the rabbit can show me the way?" There is the magical sound of tinkling bells and we hear whispering voices, though we cannot make out what is being said by them. There is also the sound of dripping water and we see water dripping from the scenery overhead. Hyper-realism.
The result is nice although the face is too smoothed, the rabbit gets left behind, there is no swaying, no whispering voices and no tinkling sound (though it does include bells!). Apart from that it's fine! At least we have the dripping water.
Yes, it's time to meet the Holy Snail. But what on Earth should the Holy Snail look like? How big should it be? You've guessed by now, by a process of elimination, that the Holy Snail must be called 'Simon', the name of my only sibling who has not yet appeared in this story. For the other siblings, I tried to create characters whose faces resemble theirs, at least a little, but is there any point in trying to project Simon's face onto that of a snail? Should I just use his name and leave it at that?
I check out what Midjourney has to propose when I search for images with the prompt 'The Holy Snail'. Here are some of my favorite images:
Which do you prefer? I certainly like the first image most, and it just so happens to have the most human face. I also like the idea of the Holy Snail carrying a gun, and perhaps I will ask for it to be wearing a cross around its neck.
Earlier I said I liked the image with the sofa in the mushroom forest. Can I combine the sofa image with the snail image to maybe get the snail sitting on a large sofa? You can only use one 'Omni-reference' image when you generate an image in Midjourney, typically the image of a character who you would like to put in your image. I have often used the 'Omni-reference' feature. But Midjourney also allows you to use multiple 'image prompt' images to inspire an image generation, and this is something I have yet to try. Now is the time. I load the sofa image and old snail image as image prompts and add this text prompt:
The old snail is sitting on the sofa in the magical mushroom forest.
I get four images that I don't like very much. This is the best of the four:
I try something different - I upload the old snail photo and this photo of my brother Simon to ChatGPT and say:
Can you project the man's face onto the snail's face? 3:2 landscape format. Photo-realism.
I'm astonished by the result and say to ChatGPT "Wow!" to which it replies "Glad you like it! That one turned out pretty striking — the combination of the snail’s textured detail and the human expression gives it a weirdly lifelike, almost storybook feel."
Another "Wow" - that's a strikingly human-like response from ChatGPT!
I really like the image, even the presence of the glasses, and the resemblance isn't too bad. Is Veo going to refuse to upload this image because it looks too much like a human?
I upload the original sofa image to ChatGPT and ask:
Can you put the snail on a sofa in a magical mushroom forest like this one. Make the human face greener to match the rest of the snail. 3:2 landscape format. Photo-realistic.
The generation fails and I try a second time and get this, still with the glasses but without his ear, which is good:
That's pretty good but you've made the face look much older. Can you give me a similar image but with a face that is more consistent with the younger face that I uploaded? Can you also remove the post near the sofa, make the colors less warm and give me the 3:2 landscape format that I asked for? Photo-realistic.
We're getting closer except that the resemblance is less good than in the first "Wow" version so I have a go at putting the orignal head back using my Affinity Photo editor:
That sort of worked but was a waste of time because I now have to ask ChatGPT to insert Sir Nigel into the image and that 's going to mess up the face again:
So I again replace the face with the earlier one and have what I hope will be the definitive image, ready for uploading to Veo:
I send the widened image to Veo with this prompt:
The hand-held camera slowly zooms in. The snail looks at the man and says with a French accent "Welcome, Sir Nigel. I've been expecting you. My name is Simon; What can I do for you?" As he talks, he slowly advances along the sofa and slime drips from his body onto the floor.
Oh, the slime! And even the 'secretion' sound at the end (if that's what it is)! I'm glad this wasn't rejected as too human. It's impressive that Veo can animate a human-faced snail - it can't have had much training data on that! The only thing wrong is that Veo doesn't seem to be able to give the requested French accent. But maybe I can add some distortion in CapCut at some point. Also, I forgot to put a cross around his neck to justify the 'Holy' name.
Click HERE for the next clips.