This will be relatively complex and difficult. I try with this prompt in ChatGPT:
I would like you to make an image where we are looking over the shoulder of a second knight who is dressed in black armor and sitting on a black horse and carrying a long (7 meter) red and white lance. This knight is about to charge towards the previous knight in silver armor who is visible in the distance. There is a rail that will ensure that the two horses cannot collide. As in the previous images, there is light rain and a blurred crowd is visible in the background. Photo-realistic imagery and 16:9 format. If you can propose two images that would be good.
The image is a good start but with some obvious problems which I try to correct with this prompt:
The black knight looks very good. Can you put the black horse on the other side of the rail barrier and put a similar lance under the right arm of the silver armored knight? Both knights should be holding their lances with their right hands. The two knights should be three times further apart, but the silver armored knight is still clearly visible because the camera has a telephoto lens. The image should be somewhat lighter. Photo-realistic imagery and 16:9 format. If you can propose two images that would be good.
I'm not getting multiple images from one prompt so I guess ChatGPT can't do that. Next prompt:
That's an improvement but each knight should be pointing his lance towards his opponent and the silver armored knight should be much further back. Also, the white horse needs to be bigger. Photo-realistic imagery and 16:9 format.
I don't know whether I need to keep repeating that last sentence. I understand Veo 3 has no memory from one clip to the next (except that you can reuse images and prompts) but I suppose that within a ChatGPT there is memory.
Better in some ways but I need you to put the white horse further away from the camera and put the black horse on the other side (the left side) of the barrier. Plus, there should be only one barrier.
Please change the camera angle to telephoto. Do not swap the position of the two knights. Leave the silver knight on the side of the barrier where it is now and move the black knight to the other side of the barrier. Be sure that each knight is holding his lance under his right arm with this right hand and that each lance is pointed towards the opponent.
This is getting nowhere fast. I decide to take two of the images and work on them in my photo editor, Affinity, to try to get what I want. Here's the result:
Of course, the horse is missing its back legs and the front ones look odd, but maybe it'll look okay in Veo. I'm more worried that the black knight will be on the wrong side of the barrier or that the clash will work badly, especially as the black knight is holding the lance in his left hand. I try this prompt:
The camera tracks the black knight. The two knights charge towards one another, each trying to strike the other with his lance. The black knight is on the opposite side of the barrier than the silver armored knight. There is lightning in the far distance beyond the trees. The black knight successfully and noisily strikes the other knight, and his lance explodes into splinters. The crowd cries out loudly in consternation. The silver armored knight falls to the ground. Cut to a static view of the silver armored knight on the ground, crying out in pain. Light rain. Photo-realistic imagery.
It's not perfect, but it's better than I expected. The feeling is right. Veo gives hind legs to the white horse, as I had hoped. Both horses are on the same side of the barrier and they attack one another. The black knight doesn't hit Lord Peter with his lance but rather with his explosive right arm. There's lots of lovely mud. It's usable, but I'll have another go, removing the bit about the lance exploding. I also do some more photo editing work to make the barrier taller, but I forget to update the image in Veo:
The camera tracks the black knight. The two knights charge towards one another, each trying to strike the other with his lance. The black knight is on the left side of the barrier, the opposite side of the barrier than the silver armored knight. There is lightning in the far distance beyond the trees. The black knight successfully and noisily strikes the other knight with the point of his lance and the silver armored knight falls to the ground. The crowd cries out loudly in consternation. Cut to a static view of the silver armored knight on the ground, crying out in pain. Light rain. Photo-realistic imagery.
It's better than before, with a good lance hit, but a knight says "Ugh, my arm" before the clash and the fall. I hope to use the audio from the first video with the visuals from the second. Or maybe I can move the "Ugh, my arm until after Lord Peter hits the ground. There isn't the cut that I asked for this time, though there was in the first clip. There are some good squeals from the horses. The 'charge' is more of a trot. The 'seven meter lance' that Lord Peter is holding looks more like two meters long, and he doesn't seem to know what he's supposed to do with it. Let's try one more time, this time with the taller barrier:
The camera tracks the black knight. The two knights charge quickly towards one another, each trying to strike the other with his lance. The black knight is on the left side of the barrier, the opposite side of the barrier than the silver armored knight. There is lightning in the far distance beyond the trees. Both knights successfully and noisily strike one another with the points of their lances and the silver armored knight falls to the ground. The crowd cries out loudly in consternation. Cut to a static view of the silver armored knight on the ground, crying out in pain. Light rain. Photo-realistic imagery.
It's probably slightly better - at least Lord Peter seems to be pointing his lance in the right direction. But the hit with the lance is less convincing than in the previous version. The horses are running a bit faster. The bit after the cut is wasted, but can be easily removed. The two horses are once again on the same side of the barrier. Lord Peter's mask changes its shape to expose his face when he hits the ground.
UPDATE: I return to this clip as I finish the project, trying to get a better result. For easier consistency in the pattern on Sir Peter's lance pattern, I have to colour the lance red in my image editor. I also move Sir Peter to the opposite (left) side of the barrier, since Veo refuses to accept that the Black Knight is on that side, but in the generated clip Veo puts both jousters between a pair of parallel barriers, so they make a 'head on collision' once again. Look carefully and you'll notice that at the end of the clip the Black Knight's horses passes right through one of the barriers, but I don't suppose many people will notice.
In this clip I want the Black Knight to be presented to the king and princess so that he can claim his prize. I ask ChatGPT to modify the image that came out in portrait format with black bars top and bottom like this:
I would like you to replace the silver armored knight on the right with a knight in black armor, his mask closed, the same black knight as in the other images. The image should be in 3:2 format, rather lighter than this one. There is light rain. Photo-realistic.
ChatGPT gives me an image with a horse I didn't ask for, in a square format I didn't ask for, which is too dark and with colours that are too warm. I'm increasingly disappointed with ChatGPT and decide to try something else. After all, ChatGPT is not a dedicated AI image creator.
I've watched a lot of videos about AI image creation and AI video creation and there is one AI image editor which gets mentioned more than any other for the artistic nature of its creations. That is Midjourney (it also does AI video creation). Click the link to see examples of what it can generate. It is not possible to create with Midjourney without paying so I signed up for the basic plan which costs $12 per month. The basic plan should allow me to create 50 sets of 4 images per month (Midjourney always creates four images for each prompt). Midjourney has an 'Omni-Reference' feature which should be very useful to me to generate different images with a resemblance to a real person, as I am trying to do. Another advantage of Midjourney is that you have the option of specifying the aspect ratio you want for the image, including 16:9. My worry with regards to Midjourney is that it seems to be oriented towards stylized artistic images rather than photo-realistic images. Oops, I just noticed that I can access Midjourney from within Pollo.ai, so I didn't necessarily need a separate Midjourney subscription. However, each Midjourney generation of four images within Pollo.ai costs 25 credits, and it may not be using the latest model, so it's possible I did the right thing.
This video gives a nice introduction to Midjourney's Omni-Reference feature:
That video suggests trying to make oneself into a Viking by uploading an image of oneself and using this prompt:
A realistic photo of a 20 year old male Viking warrior with tribal tattoos and braided hair. Background is ancient Denmark coastal village.
I uploaded a recent photo of myself and used that exact prompt except that I put '50 year old' instead of '20 year old' and got four images that weren't as nice as the ones of the young man in the video. This was the best of the four. I suppose the likeness is good, though?
Remember the photo of my brother Peter that I asked ChatGPT to put into a suit of armor? Let's try doing the same thing with Midjourney, with this prompt:
Can you make an image of this person dressed in a silvery suit of armor except for his helmet which he holds under his right arm? It's a mid-length portrait, hyper-realistic. In the background is an out-of-focus crowd. The weather is cloudy. We are in medieval times.
Here are the four generated images. For the record, I used these settings: mode = raw, not standard, version = standard, not draft, version number = 7, stylization = 100, weirdness = 0, variety = 0, Omni-strength = 150. Yes, there really is a 'weirdness' setting, and it does just what you would expect.
What do you think? The likeness is quite good in each case, but the face isn't very realistic if you look closely. None of the images are usable because of the modern shirt and in some cases the modern-dressed people in the background, despite my specifying that we are in medieval times. I asked for a crowd but forgot to specify that they should be sitting in stands. The images are not mid-length (waist up), they are head and shoulders images. In two cases the armor seems very incomplete. This is a disappointing start with Midjourney. The most usable image is the last one because we can't see much of his shirt and it's not obvious that the crowd is in modern dress. Midjourney produces 16:9 images with dimensions 1456 x 816 pixels. That's a bit small, but Midjourney includes the possibility of upscaling these images (at a cost) to 2912 x 1632 pixels. Let's try that with the last image. You won't see any difference unless I zoom in, so I'll do that (using the 'subtle' option rather than 'creative'):
I think you can see a small difference e.g. in the pattern of the shirt, but the skin doesn't look any more realistic. But maybe that won't be a problem since this is only going to be the first frame of a video clip - maybe Veo 3 will make the skin look more realistic in the rest of the clip?
So, should I give up on Midjourney and return to ChatGPT? I think I need to finish the jousting scene using ChatGPT, and then we'll see... The Omni-reference feature in Midjourney is very valuable to me.
I try generating the image again with a modified prompt in Midjourney. I set Omni-strength = 100 and stylization = 0.
Make an image of this person dressed in a silvery suit of armor except for his helmet which he holds under his right arm. He should have a determined, serious expression. It's a mid-length portrait (from the waist up), hyper-realistic. In the background is an out-of-focus crowd seated on stands. The image is set in medieval times so neither the man in armor nor the crowd should be wearing any modern clothes. The weather is cloudy, with no direct sunshine.
This is turning into a regular fashion show! These are an improvement apart from the last one which is very stylized. There is more realism. Peter's patterned shirt is no longer visible. The crowd are sitting in stands. But again none of the images are usable, because the crowd is again very obviously wearing modern clothes despite my very specific prompt.
I mentioned earlier that Midjourney has a 'weirdness' control - I'm sure you're curious to know what that will generate. So am I! I tried using the same image and the same prompt but with weirdness set to the maximum value, 3000, as well as these adjustments: mode = standard, version = standard, not draft, version number = 7, stylization = 1000 (the maximum), weirdness = 0, variety = , Omni-strength = 100 (the maximum). Brace yourself for the result!
All resemblance to Peter has gone and there is less realism but oddly the crowd could pass as being medieval (they liked to dress in blue, apparently).
Let's get back to the clip where the king and princess speak to the Black Knight. For that I decide to continue with ChatGPT. After getting the image with the unwanted horse I sent this prompt
I asked for an image in 3:2 format but did not ask for a horse. Can you make me a similar image but in 3:2 format, lighter and without the horse, please?
and got this (but I had to lighten the image and make the colours less warm)
The images you generate for me are increasingly murky and always have warm colours. Can you brighten the image and apply neutral colour balance. Also, can you make the princess look rather younger?
Even after this prompt I have to further lighten the image and make the colours less warm to get this. The princess is now too young relative to the images I have already used, so I'll stick with the older princess. I reverse the 'older' image to make it look less like the one we saw before with Lord Peter, then uncrop (widen) it and send it to Veo with this prompt:
The man at the right says with a grave tone in a noble British accent "You have won the tournament and my daughter's hand but we have never seen your face." The same man, the man at the right, then says "Please lift your mask" As he speaks he puts his left hand on top of the woman's right hand. The man at the right does not move. Hyper-realistic imagery.
(Actually the first prompt I tried went wrong when the king was supposed to say "Please lift you mask' as if there was confusion about who was supposed to say that, so the prompt above is a modified one, my second attempt.)
I'm happy with the result, except for the Black Knight's little grunt at the beginning, which I will edit out. There is light rain that I didn't ask for with this prompt - maybe Veo has memory after all? My belief that Veo has no memory is the reason I use visual cues like 'the man on the left' instead of 'the knight'.
Please click HERE for the next clips.