There now has to be a dialogue between Sir Nigel and Simon, the Holy Snail, so I probably need to show Sir Nigel's face again and risk not being able to use Veo in Google Flow. I have an idea about that: Google Flow blocks me from uploading images of people but does it block me from using images of people that I have uploaded previously? There are plenty of images of Sir Nigel that I have already uploaded but they don't have mushroom forests in the background and I can't add that, but maybe I can zoom in so we don't see much background? Or maybe I could prompt Veo to immediately cut to an image with a different background? Looking at the uploaded images I already uploaded I realize there is only one that might work: disheveled Sir Nigel leaning against the door frame of The Rabbit Hole.
I ask ChatGPT to make a description of the environment in the snail-on-sofa images, to be used in a Veo prompt. Then I ask Veo to use the image of Sir Nigel leaning on the door frame with this very complex prompt in which the description from ChatGPT is in bold:
Cut instantly to a view in which this man is in a vast, otherworldly forest filled with towering bioluminescent mushrooms, their wide caps glowing with a soft, amber light that filters down through delicate gills. The mushrooms vary in size, some arching overhead like natural canopies, others forming clusters along the winding, moss-lined path. Warm, diffused illumination spills across the scene, casting gentle highlights on the textured stems and soft shadows onto the ground. Wisps of mist drift through the air, adding a dreamy, ethereal quality. Sparse patches of grass, lichen-covered rocks, and delicate wild plants grow along the path, creating a lush yet slightly surreal woodland environment. The overall color palette leans toward earthy greens and muted golds, with the light creating a magical, almost twilight glow. The man bows and says in a noble British accent "I kindly ask you for a jar of your precious slime, for apparently it can undo the curse that has been placed on Princess Caroline." Hyper-realistic.
Veo does not block the use of the human and gives this result:
Well, the 'instant cut' idea worked (I have to edit out the beginning, obviously) and the voice is okay but the environment is certainly not consistent with the snail-on-sofa images so this clip is not usable. Also he walks forward, which I didn't ask for, and the rendering of the man is quite soft, although the rest of the image is sharp. Why so soft - is the softening like the 'drift' we saw recently, a consequence of us reusing an image?
Hand-held camera, panning slowly right. The man bows slightly and says in a noble British accent "I kindly ask you for a jar of your precious slime, for apparently it can undo the curse that has been placed on Princess Caroline." Hyper-realistic.
I'm getting impatient to make some progress today so I decide to have the starting image made in ChatGPT and then use my last Pollo credit to make the clip in Veo 3.
I ask ChatGPT:
Can you make an image where the reference man is visible in a head and shoulders shot in an environment consistent with the images of the snail on the sofa. It is a frontal view of the man, photo-realistic.
In the last image you made, the background is good and the shirt is correct but you have used the face from the snail and not the face from the reference image which I am uploading again. Please use the correct face, do not include the sofa in the image and do not give the person glasses. Photo-realistic. 3:2 landscape format.
That's better - it looks just like me! I send the image to Pollo with this prompt, crossing my fingers because I only have enough credit for this one last clip:
Hand-held camera, panning slowly right. The man bows slightly and says in a noble British accent "I kindly ask you for a jar of your precious slime, for apparently it can undo the curse that has been placed on Princess Caroline." Hyper-realistic.
Fortunately, it comes out okay. The camera does 'pan left' this time, so maybe Veo prefers the word 'pan' to the word 'orbit'?
I can continue to make clips in Google Flow as long as I don't show 'realistic humans' so I capture a frame from the clip with slime covering the sofa. I send that to Veo with the prompt:
The hand-held camera pans left. The snail continues to ooze slime onto the sofa and says with a French accent "I have buckets full of my wonderful slime and I sell them for one silver penny a jar or four pence for 5 jars." Hyper-realism.
Oops - Veo rejects this image with the message about 'photo-realistic people'. Really? It's obviously just looking for faces. This is so frustrating.
I recall that I have a free account at Google Cloud with access to Vertex AI which includes 'Veo 3 preview'. Does that mean reduced quality? It says I have 255 USD credit and 44 days left of the rather generous free trial. Will it accept the image that Google Flow just refused? Yes, but in the first clip I make (not shown) Sir Nigel walks right out of view, carrying a handbag that I didn't ask for. I try again with this prompt:
The hand-held camera pans left. The snail continues to slowly ooze slime onto the sofa and says with a French accent "I have buckets full of my wonderful slime and I sell them for one silver penny a jar or four pence for 5 jars." The man at the left remains static. Hyper-realism.
The result is usable. The man at left moves too much, bending forwards at the end to reveal an old man with grey hair, but I can edit that bit out. Is the quality lower than before? Maybe a bit, but it's okay, and above all it's good to be free to generate the clips I want without having to worry about restrictions (touch wood).
I reuse the image of Sir Nigel's head and shoulders, adding a one second pause instruction in the prompt so that the camera movement and the deletion of the first second will mean that we start with a different angle, I hope.
The camera zooms in slowly and pans to the right. After a one second pause, the man says in a pleading voice in a noble British accent "But I have no money - my purse was stolen. PLEASE give me a jar and help me save the Princess!". Hyper-realism.
The clip is not usable because he pronounces the hyphen in the prompt as if it were the word 'cadug' and pronounces the word PLEASE by spelling out the letters. Also, he walks forwards too much.
I try again and the result (not shown) is okay except that Veo has added music! So I try again with:
The camera zooms in slowly and pans to the right. The man takes only two steps forward then says in a pleading voice in a noble British accent while raising the index figure of his right hand to indicate 'one' "But I have no money. My purse was stolen. Please give me just one jar and help me save the Princess!". No music or sound effects. Hyper-realism.
Bingo! (Except that he takes seven steps forward rather than two).
I notice that even though I'm producing multiple clips to get what I want, I am apparently not eating through any credits, which seems too good to be true.
I want some variety in the angles of the shots so I'll use a close-up of the snail's upper body and face. I first use Topaz Enhance on Pollo to upscale the image to four times the resolution (i. e. double the linear dimensions). How much difference does that make? If we zoom in and compare, you can see improvement e.g. at the top left or the light in the eye or the rim of the glasses between the original image and the upscaled image:
I crop the upscaled image and send it to Veo within Vertex AI with the prompt:
The camera zooms in. The snail oozes slime from its mouth as it says with a French accent "The thought of Princess Caroline makes me drool, but give away my slime?" He laughs then continues "You must be joking, young man". Hyper-realism.
I get this rejection: 'The prompt could not be submitted. This prompt contains words that violate Vertex AI's usage guidelines. Try rephrasing the prompt.' It was happy to accept the words 'ooze' and 'slime' so I guess the problem word is the word 'drool'. I try:
The camera zooms in. The snail says with a French accent "You think I'm going to give away my slime? " He throws back his head and laughs then says with a French accent while shaking his head vigorously "You must be joking, young man". Hyper-realism.
The lip sync is a bit off and I suppose the sniffing and smile at the end is just filler, but this is usable.
I'll probably want to apply a special effect to the voice of the snail, but which one should I use? CapCut proposes at least one hundred effects and there are probably others I could use at elevenlabs. Perhaps Veo can generate special effects too with a simple request in the prompt? I selected 20 special effects and put them side by side in this video. My favorites are 'helium', 'werewolf', golden monkey' and 'harmony'. Which would you choose?
I ask ChatGPT to generate a new mage of Sir Nigel from a different angle, then uncrop that to 16:9 in Pollo and send it to Veo with the prompt (I was going to have him pull out an envelope and open it but that would have not fitted into the 8 second clip):
Hand-held camera pans left. The man says excitedly with a noble British accent without stepping away from his position. "But wait! What was in that envelope Sir Peter gave me as I started my quest?" As he says that he reaches into a pocket in his tunic and withdraws a silver coin. Then he says, while holding the coin out towards the camera, "We are saved! Here is a silver penny." Hyper-realistic.
The result is okay, though the sudden appearance of a breast pocket is awkward. Will people notice?
Click HERE for the next (and final) clips.