Good Practice Highlights, The University of Sheffield

How to Generate Consistent Images using AI

Generating images using AI tools such as Adobe Firefly is an increasingly popular method of creating visual materials, however consistency remains an issue as it is challenging to create collection of images that work together as a set.

To a certain degree, image prompting is a science and finding a consistent prompt which can generate consistent images requires some trial and error, but it is achievable.

This article was originally written as advice to a JISC project on AI image generation and a question around comic books art, but the theory remains no matter what your subject matter.

Dave Holloway, Senior Digital Learning Advisor

Digital Learning Team, Education Development Services

Generating images is an iterative process, meaning that it requires trial and error and you build upon your successes and failures to reach the point where you are happy with your final output. Much of the work involves manipulating your image prompts - changing influences, styles, formats, text weighting etc - into an order and structure which works best with your image generation tool.

I was approached about advice from a colleague looking to use AI image generation to make a comic book; a project which would require a consistent and fixed aesthetic and character design. Because of the nature of how image generations pulls from its source library, creating a set of images with a recognisable and connected aesthetic is hard to do, but by using a set pattern within your image prompt and only changing key variables, you make a set of images which look they came from the same source.

As an example, I created a set of images for an invented comic book called Bunnyman. I began by generating some ideas using prompts featuring popular artists and comic book styles. The text prompt I used was the following superhero comic book art in the style of [artist name] and [comic era or brand] featuring a superhero called Bunnyman using [art styles - pen and ink, water colour, airbrushed acrylic etc]

I really liked one of the images and I generated a full version of that, and then I uploaded it back into my image generation tool and asked it to describe the picture as a prompt. Not all tools have a describe feature, but you can quite easily find a tool which does by Googling 'generate prompt from image'.

The prompt that was created was a humanoid rabbit with long ears, wearing tactical gear, stands in the ruins of an ancient city. This is an ink drawing in the style of Katsuhiro Otomo and Jean Giraud, presented in black and white.

By generating a prompt this way I am ensuring that it is the clearest description of the image that my generation tool recognises. I should now be able to create more images in a similar style. I refer to this as my base prompt.

I then generated more images using the base prompt and simply changed the focal point of the image. In the following examples, the text in bold is the text that I changed for each new prompt.

A humanoid rabbit with long ears, wearing tactical gear, fights henchmen in the centre of an abandoned shopping mall. This is an ink drawing in the style of Katsuhiro Otomo and Jean Giraud, presented in black and white.

A humanoid rabbit with long ears, wearing tactical gear, embraces a lost child on the rooftop of a Japanese skyscraper. This is an ink drawing in the style of Katsuhiro Otomo and Jean Giraud, presented in black and white.

A humanoid rabbit with long ears, wearing tactical gear, watches TV and eats pizza in a dressing gown. This is an ink drawing in the style of Katsuhiro Otomo and Jean Giraud, presented in black and white.

As you can see, by keeping the base text prompt and only changing the actions and locations, I managed to generate images which look like came from the same illustrator and belong together as part of the same project.

After that, by adding a few more variances and a little bit of text work on Photoshop, I even have a cover for my comic book.

"A humanoid rabbit with long ears, wearing tactical gear, holds an america flag aloft in a post-apocalyptic wasteland. This is an ink drawing in the style of Katsuhiro Otomo and Jean Giraud, presented in vibrant 1950s colour. Comic cover art."

The key with image generation is to spend lots of time experimenting with different prompt structures, influences, styles and ideas until you get something which looks as closely as possible as to what you imagined.

Once you have that initial first image, using it to generate a base prompt will help you ensure consistency across all future generations. Keeping the base prompt and only changing key action or location text means you have a much greater chance of keeping your aesthetic consistent.

Supporting materials

Page updated

Report abuse