In this scene I want there to be a jousting match between our hero, the Earl of Kilmarnock, and an anonymous opponent. The scene begins by King Brian telling his beautiful daughter, Princess Caroline, that the winner of the joust will take her hand in marriage.
For the first clip I send this photo of Brian, my father, to ChatGPT and then to Grok with this prompt:
Can you make this person into a medieval king, wearing fancy clothes and a crown and not wearing glasses. The image should be a mid-length portrait, 16:9 format. In the background there should be a crowd of people sitting in stands, cheering. The weather is sunny. The image should not be too symmetrical. Hyper-realistic imagery.
The ChatGPT result is as follows. The king is okay but the crowd are a blurry unrealistic mess.
I ask ChatGPT to modify the image as follows:
That's not bad, but can you make it more photo-realistic? The colours are too warm and the faces in the crowd are too blurred. If you can't make the faces in the crowd more detailed, then blur the background. The image should be set in medieval times.
The new image is a bit better but the sun has apparently disappeared behind a cloud.
After making just two images for me today, ChatGPT tells me I've reached my daily image creation limit but that I can get up to 160 generations every three hours if I subscribe to ChatGPT Plus for $24 per month, which I do.
The two Grok images are as follows. The first has a good king but Grok doesn't seem to understand the concept of 'stands' as being raised seating where people sit (I confess it's not very logical). Also, the people in the background are in modern dress. Asking for 'a medieval king' was apparently not enough for Grok to know that the image should be set in medieval times. The second is not photo-realistic and there is an artifact over the right eye, but at least the crowd are dressed in medieval clothes. Still no stands, but we got a free castle. It's a shame Grok seems incapable of making images in portrait orientation.
I decide to go with the 'cloudy' ChatGPT image, though I lighten the image before sending it to Veo. As usual, I have to extend the image in Pollo.ai since ChatGPT makes images in 3:2 format and Veo wants 16:9 format. But Pollo has both an 'AI extend' and an 'uncrop' option - which should I choose? Here are the results of 'extend' and then 'uncrop':
The second one, 'uncrop', is clearly better than the the first, 'AI extend' because AI extend has apparently added someone in a modern black suit at the right side of the frame.
I send the 'uncropped' image to Veo 3 with this prompt:
The camera orbits slowly left. The old man says with a noble British accent “My dear daughter, Princess Caroline, today the tournament will end and the winner will take your hand.” As he talks he raises his right hand and points to something in the distance in front of him and to his right. We hear the crowd cheering in the background. Photo-realistic imagery.
Here's the result:
I like the result, though of course I would have preferred to have used my father's voice, via a voice clone. It's a shame the beginning of the word 'My' is cut off. Above all, it's a pity the colours are so warm - I don't know why ChatGPT is doing that.
UPDATE: In late August 2025, after I had already completed this project, Google came out with a new AI image generator called Nano Banana or Gemini 2.5 Flash Image. I gave it the same image of my father and the same prompt and it generated these four images:
They're not bad, with decent photo-realism and brightness, but they are in portrait orientation (928x1120 or 5:6) and not the asked-for landscape (16:9) orientation. They vary in emotion and resemblance to my father - I think the last and first have the best resemblance.
I asked Nano Banana (Gemini 2.5 Flash Image) whether it is capable of generating images in 16:9 landscape format and it said yes and suggested that I request that format even more specifically, which I did. It then generated four more images, all in portrait format except one, which showed a young king with a beard. The conversation continued like this (I have italicized the most interesting parts):
I said:
You have again failed to generate an image in 16:9 landscape format. You have only succeeded in one image out of 4, but at least you have demonstrated that you can do it when you choose. Why do you think you keep failing to follow this simple instruction and what can I do to improve my chances of success?
Nano Banana said:
I sincerely apologize for the continued frustration. You are correct, I have failed to consistently deliver on your request for a 16:9 landscape image, and that is a significant shortcoming on my part.
The reason for this behavior is that the image generation model can sometimes prioritize the composition and orientation of the source image over the new aspect ratio you request. The original image you uploaded has a strong portrait orientation, and the model is struggling to re-frame the person within a widescreen landscape. It defaults to a portrait composition because it's a "simpler" transformation for the AI to perform while maintaining a strong resemblance.
To increase your chances of success, you can try a few things:
Start with a more general prompt. Sometimes adding too many specific details can confuse the model.
Try a different landscape ratio. I have been successful in generating images with a 4:3 aspect ratio, which may be a better compromise between the source image and a landscape view.
Acknowledge the model's limitations. Sometimes the AI will struggle to perform multiple conflicting instructions (like changing the aspect ratio while also trying to maintain a very specific facial resemblance).
I tried requesting an image in 4:3 format as suggested and Gemini said "I can create images of people, but not ones that portray a real person like that." This after having generated several images with a good resemblance.
Then I tried submitting a photo of Brian in 16:9 format and at last got a good result in 16:9 format:
Would I have used Nano Banana instead of ChatGPT to generate images during my project if it had been available? Absolutely, as long as there is consistency in the image generation. Nano Banana has multiple advantages:
It can generate in 16:9 format (once one has figured out how to do that reliably). That means I don't need to uncrop the images to 16:9 before sending them to Veo.
It generates images that are perhaps more photo-realistic and certainly less murky.
ChatGPT could only generate a very limited number of images per day with free access, forcing me to pay for a subscription. Nano Banana says it cannot tell me what the daily limit is for free access, but I suspect that it is generous.
I asked ChatGPT to use the 'cloudy' king image as the base for a new image, also sending this photo of my sister Caroline. I used this prompt:
Can you make an image which is an over the shoulder shot, looking over the king's shoulder at this person who should be dressed as a medieval princess, wearing a crown and handsome clothes. The image should be photo-realistic. The weather is cloudy, as in the image of the king. The background should be similar to the image of the king, with a blurred crowd sitting on stands.
There is something of a likeness, though I'm surprised ChatGPT made Caroline look older than in the photo I sent, as if trying to make a queen rather than a princess.
I sent this image and this prompt:
The camera orbits slowly right. The woman looks down, then back at the man and says with a noble British accent “I hope the Earl of Kilmarnock will win, father, and I think you want that too.” We hear the crowd cheering in the background. Photo-realistic imagery.
I'm happy with the result, except that I wasted some Veo credits by forgetting to use Pollo to extend (sorry, uncrop) the image into 16:9 format. But I have an idea for that...
Click HERE for the next clips.