Protest is in the AI of the Beholder

...or, something other than ChatGPT for Once

Jun 28, 2023

Again, thanks for stopping by! If you haven’t, please subscribe (I’m free!), and recommend to others who might like to enage. So…

I recently gave a talk on Rebecca Solnit’s Hope in the Dark, a wonderful book that someone like me requires; I need to feel that the actions of the few, or of protest in general, matter. (If you haven’t read Rebecca Solnit’s books or columns—read everything.) In preparation, I was talking with a colleague about adding some media for the event, specifically from Midjourney—an AI image-generator—and that’s when things got interesting. (As we all know, there is no reading safe haven from ChatGPT these days, so for a chance of pace, let’s talk AI images.)

In preparation for my talk, I decided to prompt Midjourney with the single word 'protest' and see what came back (Solnit’s book often discusses protest in different forms). Below are the initial results:

I found the results both surprising and completely unsurprising (you know, that holding two contradictory ideas in your head thing). If I ask Midjourney to render "protest" and these are what the alogorithm returns, what are my assumptions about the assumptions Midjourney is mining?

Protest is practiced by young people.
Protest is practiced by white people.
Protest is practiced by people with a "grungy" or "hipster" aesthetic.
Protest happens in urban spaces.
Protest happens on cold and/or wet overcast days.
Protest involves signage (often mocked for spelling errors, but we require full forgiveness here).
Protest, in the bottom left photo, includes what appears to be a peaceful police presence and patriotism in the presence of a flag or two.
Conversely, the bottom right photo hints at some violence, with the female figure's face rendered with abrasions and some blood.
Finally, protest is practiced by people who are often ridiculed for doing so, as the people depicted likely have smart phones or laptops, and for some media voices this is a totalizing hyprocisy that nullifies all their social concerns.

Put another way, these might be the people of Occupy Wall Street, who were widely mocked and dismissed in media coverage as being uninformed, dirty, poorly dressed, lazy, but simultaneously deeply privileged. They are there protesting, but they don't really know why.

In Midjourney, you can ask for variations of any of the returned images. I chose the bottom right image:

**In addition to scrapes, young people at protests either froth at the mouth or eat powdered donuts**

In the above images, I am curious as to what led to the foregrounding of a female figure, while all of the figures represented behind her appear to be closed-mouthed men; only the woman's mouth is open.

Given what I've seen so far, I altered the prompt to "violent protest." Again, the renderings are both surprising/unsurprising:

As you can see above, the word "violence" completely eliminated the appearance of women. I cannot locate a single female figure. Also, in the bottom right photo, the foregrounded figure and all the background figures are open-mouthed yellers, which differs from the image with the female figure out in front of closed-mouth backers. Also, in the images on the left, is that law enforcement protesting? I can’t tell.

Again, what do we see? Screaming. Lots of screaming, and whatever word each mouth is in the process of producing, it is violent. One of the photos includes fire, likely in the form of a burning sign. Why is this interesting? Yelled words are violent in Midjourney renderings, while written words are peaceful (where’s Derrida when you need him?) How do we know? Here are the Midjourney results for "peaceful protest," and they all include foregrounded signage and imply a quiet setting (only a few of the "violent" images include signage, but only in the background):

So what do we see in "peaceful"? Women displace men. Also, not only is there no screaming; there are no mouths! I repeat: The women and other protestors have no mouths because…mouths are violent? Major figures stand masked, a school bus is added, and the top right image appears to include our first person of color. (It's also finally warmer—short sleeves!)

Throughout this little experiment, I am struck by the consistent homogeny: older people do not protest, or protests are never intergenerational. There is racial homogeny, and when you add the adjectives "violent" or "peaceful" there is near gender homogeny. There is also homogeny in location: cities.

To be clear, I like Midjourney. I think it returns some beautiful, startling images. However, what is rendered in this batch of images is, to my eye, every reportorial stereotype deployed against protest to neutralize that protest's particular goal or effect.

And maybe that’s the point. My last post was about seeking the unexpected. My assumption here is that the alogirithm is designed to meet expectation (that’s the goal, right?). Even so, those expectations, whatever they may be, are bursting at the seams with stereotypical assumptions.

The Declining Academic

Discussion about this post