I have written here earlier about OpenAI’s DALL-E 2, “OpenAI’s DALL-E 2 Text to Image Synthesis Astonishes Experimenters”. Those who have obtained early access to the program (I, as most, remain on the waiting list) continue to report stunning results from the text prompt to image synthesis and modification program. For example:
While we wait to try the monster DALL-E 2 model, Boris Dayma and Pedro Cuenca have created an open source clone, DALL·E mini, posted the complete source code on GitHub, and set up a live playground at Hugging Face where you can run it from your browser, “Official DALL·E Mini Demo”. (Experimenting with the demo requires some patience, as Hugging Face imposes concurrency limits on access to their servers and you may have to try several times or wait until they’re less busy to avoid getting a “Busy: try later” message. Here are instructions and resources for setting up your own local server.)
While it lacks the staggeringly large training set of DALL-E 2, the results are still impressive and improving as the developers train it on more images from the Internet.
Here is what I got when I prompted DALL·E mini with “alien message in human DNA”, going for an illustration for my story “We’ll Return, After This Message”.
The images are created based upon the tagged images in the training set, but they are in no sense composites of them. Often an image it creates will look nothing like any in the training set. How it works is explained in the video at the bottom of the first post on DALL-E 2, “DALL-E 2 Explained”. To simplify enormously, it learns to associate features in the training images with words and phrases in their descriptions and then, when presented by a prompt, combines features associated with those in the prompt until they come as close as possible to matching the prompt when the process is reversed.
This is just a reminder that DALL-E mini is open to the public and ready to create images from your prompts when you click the link earlier in this sentence. Well, not necessarily ready—because the demand is so great, it may say it’s too busy and you should come back later, but if you persist after the masses in the legacy continental-scale empires have dozed off, you can get your results. Here’s one that popped into my mind earlier today.
Not exactly what I expected, but interesting…and disturbing. Who hasn’t had cats like this?
What happens if you start with that sheep-and-Corgi one and try going Boolean? I mean can it handle that? “sheep being sheared by Corgi NOT uncanny” - ?
Is “trending on ArtStation” part of the prompt? All the examples you show include that. Is that the modern equivalent of “may it please the deity” invocation?
That seems to encourage it to produce art-style results as opposed to cartoons, line drawings, or other kinds of output. Here is one I got when I prompted DALL-E mini with “tyrannosaurus with orange cat face”
I would caption this “Tyrannosaurus highly dubious of result from Fourmilab experiment crossing dinosaur with psychotic orange cat.”.