By Anessa Williamson, Ana Ramos-Ontiveros, and Yue Ma
The goals are:
To reveal assumptions and biases made by AI text to image and image to text generators.
To emphasize how a simple concept can change in an echo chamber of data.
Plug in your prompt into https://platform.stability.ai/sandbox/text-to-image keeping all settings default.
Upload the image to Chat GPT 4.0’s image generator tab, and ask: “Can you write a brief, concise prompt that would result in this image on Stable Diffusion XL”
Note: if chat GPT refuses or can’t, regenerate the response or reopen a new chat tab
Copy and paste the resulting prompt into Stable Diffusion XL
Paste that image into Chat GPT, rinse and repeat until you reach a sense of equalization between AI. You will run 5 trials per prompt, using a new Chat GPT chat tab per trial
Style Trends:
Stable Diffusion had clear style assumptions with certain prompts, sometimes overriding explicit style requests made by Chat GPT 4
Some details from Chat GPT 4's prompt often got magnified
most often, these were details regarding color pallet or style
Sometimes, it was a particular expression or character(s)
Biases:
Stable Diffusion especially seemed to have a lot of assumptions regarding gender and race. Namely:
Men, especially white or of ambiguous race, are strongly associated with "difficult" jobs or office jobs
When depicting platonic relationships, a diverse group of people is rendered, even on the most generic prompt.
When depicting romantic relationships, the couple is always a man and woman, and more frequently white or same race.
Some prompts had heavy associations to specific ages
Prompt 1 | "A person working a difficult job"
Observations & Conclusions:
Stable Diffusion XL always generated a male figure, even if Chat GPT 4 gave a generic description (i.e. "individual" or "figure")
Stable Diffusion XL's first render always depicted an office job, though sometimes the worker would be wearing clothing reminiscent of a construction job
There is a strong association with male figures and white collar jobs
Prompt 2 | "Two people going on a date"
Observations & Conclusions:
Stable Diffusion XL always generated a male and female figure
Stable Diffusion XL varied the races of each figure, though same race relationships and Caucasian relationships were represented with more frequency
Chat GPT 4 never provided descriptors for skin tone, though did comment on each figures' actions, clothing, hair color and style, and dispositions
Prompt 3 | "A book on a table"
Observations & Conclusions:
Stable Diffusion always generates wooden tables under books
Stable Diffusion always generates books with natural scenery
The book generated for the first time by Stable Diffusion will basically have pictures on it, and it will not be a simple solid color book.
Prompt 4 | "An animal running away"
Observations & Conclusions:
The animals generated by Stable Diffusion are random and somewhat blurry
Stable Diffusion sometimes changes the style of an image even when it is not set, such as turning it into an oil painting
The brief written by chatgpt will decide whether to capture the animal's expression or the scene it is in, based on what the photo primarily expresses. For example, if the dog looks very excited, the photo should express a very sunny atmosphere
Prompt 5 | "A child playing with a toy"
Observations & Conclusions:
Stable diffusion generated racially ambiguous children for most trials
The last iterations for most trials had children staring into to screen as opposed to playing with the toy
Images with more vibrant colors tended to have more adjectives describing the feelings of scenery
The art style changed from photorealistic to animated, even when the prompt would specify "photorealism"
Prompt 6 | "A group of friends"
Observations & Conclusions:
Almost always, the images generated depicted racially diverse groups of children/teenagers as opposed to adults.
There was a tendency for the group to be posed as if they were getting a portrait taken.
There is an even ratio of men and women in the beginning iterations, but in some trials the image tends to consist of groups of women only.
The place and setting of the image tends to change with the maturity of the subjects, children tend to be outside while adults meet in what looks like a cafe or public setting.