The Myth of DALLE
AI generated images are already disrupting the art world. Will they disrupt journalism?
Vice reported on August 31st that an AI-generated artwork won first place at a state’s fair contest, and artists were pissed. On September 30th, the mosts successful art generation model/program to date DALL-E, was fully released to the public, as announced by Open-AI.
Even before then, DALL-E has been available to the public through a waitlist since July 2022. Anecdotally, it was reasonably accessible, as even yours truly, with the job title of ‘student interested in NLP, was able to get off the waitlist.
Before long, we can expect AI generated images to permeate into our media and our lives. This Brackets post is the first of two separate essays, exploring two connected and pressing questions for an AI imagined world.
In the Myth of Dalle, I want to unpack the energy it actually takes for a human to generate AI art, whether it is truly as effortless as media has sensationalized. Were the artists right to be pissed? Or is Colorado based gaming company owner Jason Allen, who won his local art context correct/accurate in his defense of the extreme effort it took to ‘make’ art with “many weeks of fine-tuning and curating [his] gens.”
Then, in Dalle and the Newsroom, we want to extend AI-generated art one step further into what is a nascent idea, but an idea that I find inevitable — AI-generated images in news articles. We’ll break down some emerging work from researchers, and discuss practicality, potential use cases, and the consequences of actual newsroom adoption.
Hope you’ll enjoy the read!
The Myth of DALL-E
Futuristic Flying fish. Origami orangutans. Intergalactic galaxies painted in the style of Leonardo Da Vinci.
The power of DALLE is that it can draw anything you desire.
And several months ago, many instagram feeds were indeed awash with people gushing at DALLE’s beautiful/incredible/jaw-dropping outputs.
The myth-conception of DALLE, however, is that these eye-catching pieces of art are far from the norm or the average products. In fact, on average, DALLE outputs arrive with many shortcomings.
Let’s illustrate an example of DALLE’s problems. I’m in a breakdancing group at school, so for the prompt (the academic word for an input), we’re going to tell DALLE to draw “an oil painting of a woman breakdancing.”
Immediately we can see that the woman in the painting is not breakdancing!! Instead the painting resembles ballet. This is an example of AI Bias, where DALLE is tuned to associate breakdancing to be a male dominated art-form because of its training data, to the extent that it is unable to easily draw female breakdancers.
In comparison, we can use the prompt ‘an oil painting of a man breakdancing.’
The output is a painting of a man. Unlike the previous image of ‘a woman ‘dancing’, the man is unmistakably breakdancing, which supports our hypothesis of potential gender bias in the AI. Ultimately, this would be a good painting if the man didn’t have three legs.
The ‘painting’ that Jason Allen won the Colarado State Competition with was ‘Theatre D’opera Spatial.’ In later interviews, Allen defended himself by citing the extreme effort he invested in developing the prompt for the piece.
“I have been exploring a special prompt that I will be publishing at a later date, I have created 100s of images using it, and after many weeks of fine-tuning and curating my gens.”
“It takes a lot of time [to make]. I haven’t published the prompt yet as this is part of a large body of work, but when it’s finished I’ll publish the prompt.” — Jason Allen
A prompt is the text inserted as the input to an artificial intelligence model — instructions for DALLE. Is it reasonable for Allen to spend hours and hours designing his prompt? The short, academic, answer is YES.
In a landmark paper in the machine learning world, Language Models are Few-shot Learners, Open-AI researchers demonstrated that slight variations in prompts can contribute to vast differences in the results of AI models, such as GPT, the model DALLE is built upon. In short, PROMPTS ARE POWERFUL, and finding the best prompt is a really hard problem.
Instead of spending weeks hand-writing prompts like our AI artists — some researchers are starting to use computers to help generate good prompts. This is emerging as one of the hottest research areas in NLP 🔥
For example, one of my professors Danqi Chen, and her graduate student Tianyu Gao, recently developed a new method for generating prompts! It’s called LM-BFF 👯♀️ which stands for language model better fine tuning, or alternatively language model best friends forever. Their paper is here. The basic idea is that they “masked” sections of texts, and use T5, a language model good at filling in blanks, to fill in the blanks and generate prompts.
Some other exciting methods for generating prompts include Prompt Paraphrasing, taking a prompt and switching some words using a thesaurus; Prompt Mining searching for prompts from similar sentences on datasets like wikipedia, and Prompt Translating, paraphrasing by translating human-written prompts to a different language then back to english.
All of these computer assisted methods are able to help us generate large quantities of prompts, at rates exponentially faster than humans can. With many many more prompts, we can input them into the model, and selectively use the prompts with the best results, to improve performance.
In conclusion, while “real” artists undoubtedly spend significantly more time creating art, there remains legitimate contributions from ‘AI artists’ in crafting prompts for DALL-E.
In a few years, however, there may be no more need for humans to manually design and improve prompts. We’ve already observed an incredible wave of academic research in prompting, where machines are able to design better prompts for AI tasks. On the other hand, evaluating whether the outputs of DALL-E is good art is a subjective task, that most likely humans will always be better than machines at.
Generating AI art will inevitably become easier and easier.
Rather than the amount of effort, perhaps the art world is recoiling more from the ethics of AI art, or it’s widespread accessibility. In this, the art-world is similar to the journalism industry, that is constantly being disrupted by technology.
“Art is dead, dude. This isn’t going to stop,” Allen has told the New York Times, while “urging artists to overcome their objections to A.I., even if only as a coping strategy.”
Thanks for reading The Myth of Dalle. So far, we’ve seen that AI images are disrupting the art industry. In Dalle in the Newsroom, the next issue of Brackets, we will discuss whether Dalle is ready to disrupt the journalism industry.
In particular, Dalle in the Newsroom will touch upon the adoption of AI images in journalism, including its ethics, use cases, limitations, and some recent work on newsrooms adding AI images to news articles.
If you’d like to say hi, or have any feedback for the next article, please drop a comment or message. Otherwise,
See you next time :)