I gave ChatGPT this image of my brother Peter and this prompt:
Can you make an image of this person dressed in a silvery suit of armor except for his helmet which he holds under his right arm? It's a mid-length portrait, photo-realistic, with the same feel and colour scheme as the previous image of the king and the princess. In the background is the same out-of-focus crowd as in the previous image. The weather is cloudy.
I really like the result because the likeness is relatively good. ChatGPT seems to like beards more than I do.
At that point ChatGPT makes a proposition and we have an interesting conversation about the 16:9 issue...
ChatGPT said:
Since we now have the king, princess, and knight in the same style, would you like me to create a scene with all three together (in the arena setting, same cloudy weather and crowd) for continuity? It could look like a grand medieval moment.
I said:
Yes please. You can only make landscape images in 3:2 format, not 16:9, is that correct?
ChatGPT said:
Exactly — at the moment I can generate landscape images in 3:2 format (which is slightly taller than 16:9).
That said, if you specifically want a 16:9 composition, I can still design the scene with a wide cinematic framing in mind. It will technically be 3:2, but the proportions and layout can be arranged so that it feels like 16:9 (e.g., centered subjects, space on the sides).
For the scene with the king, princess, and knight, do you want them standing together in a formal pose (portrait style) or engaged in a scene (e.g., the king addressing the crowd, the knight at attention, the princess beside him)?
I said:
If I were to ask you to put black bars top and bottom so that the remaining space has a 16:9 ratio, could you do that?
ChatGPT said:
Yes — I can absolutely do that. Here’s how it would work:
I generate the 3:2 image (the standard output).
I then add black bars (letterboxing) at the top and bottom so the visible content inside the bars matches a 16:9 ratio.
That way you get the cinematic 16:9 feel, without distorting or cropping the original image.
Do you want me to create the king, princess, and knight scene in 3:2 and then apply the black bars for a 16:9 version?
I said:
I would want you to conceive the image such that nothing important is lost when the black bars are applied. Can you do that?
ChatGPT said:
Yes — I can definitely do that. Here's how I'll approach it:
I’ll compose the image with a 16:9 “safe zone” in mind — meaning all key elements (faces, hands, crowns, gestures) will stay within that horizontal band. The extra height in the 3:2 frame will only include non-critical content (like sky or lower background), so when I apply the black bars, nothing important will be hidden. The result: a cinematic 16:9 image with full integrity, just in a padded 3:2 container.
Wonderful! I won't have to use Pollo anymore to convert the images from 3:2 to 16:9! But the image that ChatGPT generates is this:
It has completely failed to process the image as we had agreed! I ask it to try again and get this image. It's gone back to the usual 3:2 format and hasn't been consistent with the original clothes, though it is somewhat consistent with the clothes in the previous image of the three of them, proving that it can do it.
I widen the latest image then pass it to Veo with the prompt:
The camera orbits slowly left. The woman says with a noble British accent “Peter, Earl of Kilmarnock, take this silver coin that it may bring you victory.” As she speaks she passes a small coin to the knight who accepts it and bows slightly. We hear the crowd cheering in the background. Photo-realistic imagery.
That's odd! Instead of taking the coin she offers, Peter magically produces his own coin which he passes to the king! Bad Peter! I modify the prompt:
The camera orbits slowly left. The woman says with a noble British accent “Peter, Earl of Kilmarnock, take this silver coin that it may bring you victory.” As she speaks she holds out a small silver coin. The knight accepts the coin into his empty hand. He then bows slightly. The king's hands remain clasped together - he does not give or take any coin. We hear the crowd cheering in the background. Photo-realistic imagery.
WTF? This Peter character is really determined not to do as he is told. He even rudely pushes the princess's hand out of the way! How can I make the prompt better? Is the problem that I said 'knight' and 'king' when I should have said 'man on the right' and 'man on the left'? I ask ChatGPT for help:
I said:
I sent that image to Veo 3 with this prompt: "The camera orbits slowly left. The woman says with a noble British accent “Peter, Earl of Kilmarnock, take this silver coin that it may bring you victory.” As she speaks she holds out a small silver coin. The knight accepts the coin into his empty hand. He then bows slightly. The king's hands remain clasped together - he does not give or take any coin. We hear the crowd cheering in the background. Photo-realistic imagery." In the video that Veo makes the princess offers a coin which the knight does not take. Instead he produces his own coin which he passes to the king, who accepts it. Can you suggest an improved version of my prompt that will make the knight accept the coin offered by the princess while the king does nothing?
ChatGPT said:
Ah, I see exactly what’s happening — Veo 3 is being a bit creative in interpreting the interaction because your original prompt isn’t crystal-clear on the sequence of actions and the roles of each character.
To keep Veo focused on only the princess offering and the knight accepting, while the king does nothing, we need to:
Make the action order explicit.
Clearly state that the knight’s hands are empty until he takes the coin.
Explicitly say that the king never interacts with any coin.
Here’s a refined version of your prompt:
The camera orbits slowly to the left. The woman, standing arm-in-arm with the king, speaks with a noble British accent: “Peter, Earl of Kilmarnock, take this silver coin that it may bring you victory.” As she speaks, she holds out a small silver coin directly toward the knight. The knight’s right hand is holding his helmet, and his left hand is empty. The knight accepts the coin into his empty left hand. After taking the coin, he gives a small bow. The king’s hands remain clasped together the entire time — he does not take or give any coin. In the background, the crowd cheers loudly. The scene is photo-realistic.
I modify that some more because the knight is holding his helmet in his left hand, not his right - how could ChatGPT not know that? Because in the original photo of Peter alone he's holding the helmet in his right hand?
The camera orbits slowly to the left. The woman, standing arm-in-arm with the king, says with a noble British accent: “Peter, Earl of Kilmarnock, take this silver coin that it may bring you victory.” As she speaks, she holds out a small silver coin directly toward the knight (the man on the right). The knight’s left hand is holding his helmet, and his right hand is empty. The knight accepts the coin into his empty right hand. After taking the coin, he gives a small bow. The hands of the king (the man on the left) remain clasped together the entire time — he does not take or give any coin. In the background, the crowd cheers loudly. The scene is photo-realistic.
Grrrrrrrr! He STILL doesn't take the coin, though he ALMOST does. that'll have to do, because the failed clips are eating through my Veo credits. I now have 570 credits left from this month's 1000 credits - enough for another 28 clips. I note that the camera fails to orbit left. It didn't for the previous clip either, though it did for the first. It's really important to have some camera movement to give more of a sense of 3D.
UPDATE: As I was finishing this project I tried once again and at last Sir Peter took the coin... which he then dropped on the ground, causing all three people to look to the ground! Maybe the AI associates coins with coin-tossing? I may be able to use the clip by removing the last second.
I gave the original image of Sir Peter to Veo with this prompt:
The camera zooms slowly on the man's face. The man says with a British accent in a deep, strong voice “My Lady, I shall fight with honor and skill until victory is ours". As he speaks he bows slightly and places his right hand on his heart. We hear the crowd cheering in the background. Photo-realistic imagery.
That'll do. I hadn't noticed that he was holding his helmet in his right hand. The camera doesn't zoom.
I use this prompt
Can you make an image of the knight standing next to a white horse, about to mount it on the horse's right side? His valet is with him, ready to help him. The knight is now wearing his helmet. The lance is on the ground next to them. Cloudy weather with light rain.
and get this image:
I send this prompt
Not too bad but it looks like the knight is about to climb on the man, not the horse. Can you move the knight right next to the horse and make the valet look towards his knight?
This is the result:
You have put him on the horse already, and the horse seems too small to carry in man in heavy armor. Okay, let's keep him on the horse but please make the horse bigger and please reverse the whole image so we see the horse from its right side.
Please make the horse significantly bigger and do not show the lance in the image.
That is the perfect size for the horse. Please give me the same image reversed so that we can see the horse's right side and please always produce images in 3:2 format.
At last I get a usable, beautiful image. Why is it such hard work?
I uncrop the image and send it to Veo with this prompt:
A young valet appears from the left carrying a long white and red lance which he passes to the man on the horse. The valet says to the man on the horse with an enthusiastic young British voice in a British accent "Good luck my lord - the Black Knight is undefeated". It is raining lightly. Photo-realistic imagery.
Who the heck is Lord Cladian? And why will no one ever take things that are passed to them? I try again with this revised prompt:
A young valet appears from the left carrying a very long (about 7 meters long) white and red lance which he offers to the man on the horse. The man on the horse takes hold of the lance with his right hand and holds it level. The man who offered the lance no longer holds it since it has been taken by the man on the horse. The valet says to the man on the horse with an enthusiastic young British voice in a British accent "Good luck my lord - the Black Knight is undefeated". It is raining lightly. The crowd in the background is cheering. Photo-realistic imagery.
Incredible! It seems that in Veo it is impossible for one person to pass any object to another person. Instead, a second lance appears out of nowhere! I wonder whether other AI video generators have the same problem. In particular, I wonder whether Veo 3 Quality has this problem. It produces somewhat better videos than the Veo 3 Fast model that I have been using, but at 5 times the cost (100 credits vs 20 credits per video). My credits have to last another 3 weeks so I can't afford 100 credits, so I'll use that clip and try to edit it so as not to see two lances at the same time. I have another idea. since it's so easy to do a French version, I will probably do that, so why not do a version of this clip and see what happens?
There is still weird stuff going on with the lance, a little twitch in the knight's head at the end (perhaps intended to be his shock at hearing that the Black Knight has never been beaten) and an unintelligible word added after the word 'monseigneur'. But maybe I can edit it to make it usable.
UPDATE: As I finished this project I had another go at running the same prompt as before and it was rejected on the basis that it contains 'sensitive word(s)'. I assume that the problem word is 'lance' so I try replacing that with 'pike'. Still rejected. Could the problem word be 'undefeated'? I try replacing that with 'rarely misses'. Again rejected. When I replace 'lance' with 'pole' and 'The Black Knight is undefeated' by 'May your aim be true', the prompt passes and the knight DOES take the pole, but the pole doesn't have the spiral pattern that appears in later clips and it is the knight that wishes good luck to the valet! So the clip is unusable.
I continue to try other prompts and eventually have success with this prompt (I had tried using the phrase 'jousting pole' but Veo rejected the word 'jousting'):
Hand-held camera. A young man appears from the left carrying a 6 meter long red pole which he offers to the man on the horse. The man on the horse uses his empty right hand to take hold of the long red pole that was offered to him. Then the young man who has given the pole to the horseman leaves the scene to the left. The young man at the right says to the man on the horse with an enthusiastic young British voice in a British accent "Good luck my lord. May your aim be true". The man on the horse says nothing. It is raining lightly. Photo-realistic imagery.
It's still awkward, and the pole is too short, but it'll have to do.
Click HERE for the next clips.