The images are created based upon the tagged images in the training set, but they are in no sense composites of them. Often an image it creates will look nothing like any in the training set. How it works is explained in the video at the bottom of the first post on DALL-E 2, “DALL-E 2 Explained”. To simplify enormously, it learns to associate features in the training images with words and phrases in their descriptions and then, when presented by a prompt, combines features associated with those in the prompt until they come as close as possible to matching the prompt when the process is reversed.
5 Likes