Stable Diffusion is a deep learning, text-to-image model released by StabilityAI
You might not have noticed, but there has been a quiet revolution in AI generated art. With the rise of DALL-E, Midjourney, Imagen and now Stable Diffusion we have seen deep-learning AI systems that can generate images simply from text prompts, analogous to the way AI systems like GPT-3 have revolutionised text-to-text generation.
In simple terms, this means you type a short, descriptive sentence like, "wealthy cat wearing a top hat painted by Rembrandt" and the AI systems then generates images that are variations on that theme. You can see an example I generated myself using that exact prompt:
This is known as "text-to-image" or just "txt2img". You input a description and the AI model generates images based on that input.
Not so long ago these systems were only accessible to computer scientists and researchers working at the tech giants, like Open AI or Google. They then started to become more widely available, but required expensive subscriptions to either APIs or web-based front-ends that wrapped the API. So whilst they became accessible to more people, they still tended to be used mostly by tech "geeks" or the more dedicated professional digital artists.
But this all changed in 2022 when StabilityAI released their own deep-learning AI text-to-image model called Stable Diffusion. (the name references the diffusion process that these image generators utilise to take noise and refine it until it becomes diffused into a 'stable" image). The big difference is that Stable Diffusion is open source - anyone can download and run it on a home PC system (albeit one that requires a powerful GPU with lots of vRAM). This has effectively democratised the generation of images using AI and opened it up to a much wider audience, as you can see on the StableAI Discord channel.
Whilst it's not super-simple to get running (you need Git, Python and an understanding of command-line tools) there are many ways of accessing it, including GUIs and commercial online versions such as Dreamstudio. You can also access it via the simple and free web-interface on Hugging Face. To run at home you also need a powerful GPU - preferably a recent Nvidia RTX card with 8GB vRAM.
What Can it Do?
Basically, whatever you can dream up, it can generate - with some caveats. AI models need training and Stable Diffusion itself was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset that utilises images "scraped" from the internet. This obviously influences what it can "dream" up and there are whole debates about the ethics of this. It's also dependent on how it interprets your text input - natural language parsing is a whole field in itself. Then there's also a random factor, as you usually provide a random "seed" when rendering your image(s). On top of this there is a whole art to tailoring your prompts to generate what you require - it's not quite the case you can just type in anything you want as you will get a very literal result. In fact there are whole tools dedicated to generating prompts, such as Magic Prompt. There is also now a dedicated Stable Diffusion search engine in the from of Lexica.
So, enough boring text and here are a few examples of things I've generated using Stable Diffusion. (You can view many more in the AI Art gallery on this very site).
Want to see how Scarlett Johansson might look in a futuristic sci-fi Barbarella / Bladerunner crossover?
Or what if you want to blend the disparate aesthetics of punk rock with the Art Noveau?
And talking of punk, you can also do something more photo-realistic, by manipulating real images and combining them with fantasy elements to create "really angry punk rockers":
Or something more pleasing, as I dreamt up a "girl in a red coat walking through the forest in Autumn with dappled sunlight in style of Studio Ghibli"
Or for fun imagining what you would get if you combine David Bowie's album covers with that of Iron Maiden...
Or perhaps a rumination on death and decay, creating a virtual collage of things you might find decaying on a forest floor...
Or the same concept but using magical things you might find on a beach (or an installation at the Tate Modern):
Or a beautiful witch with thorns and roses in their hair...
Or take the iridescent colours of a peacock feather and apply them to an exotic virtual model...
If you want to view more, than please head over to my AI Art gallery.
Leave a Comment
Just fill in the form and click Submit. But note all comments are moderated, so spare the viagra spam!
Tip: You can use Markdown syntax within comments.