🤖 How does AI generate images?

Text-to-image 101

🤖 How does AI generate images?

Reading time: 3 minutes

Hi friends! I’ve been working on a bigger project involving a lot of text-to-image and went down a rabbit hole looking at the underlying principles. This week, I’m taking you to the latent space 👽 Let’s get to it.

  1. 🖼 Explainer: How does an AI generate images?

  2. 🎥 Watch: An AI artist explains his workflow

  3. 🛠 Tool: Bing AI opened for the public, added extra features

1. 🖼 Explainer: How does an AI generate images?

  • The basic principles

    • 1. Input of training data. Training data is fed to the AI. Training data consists of a) images, and b) text describing the image.

    • 2. AI organizes all the data with self-invented variables. AI organizes all the data and all possible images according to variables. It invents those variables itself, to “relate” all the data to each other. Then it fills in any gaps between those variables. Those gaps is where creation takes place.

    • 3. The variables create the latent space. You can think of the latent space where all the images are organized according to the variables, and all images an AI could create (with that model) exist.

    • 4. Output is a location in latent space. Text prompts navigate us to that location in the latent space.

  • Since every model (Midjourney, Stable Diffusion, etc) has a completely different latent space, they will produce different images and will require different techniques to navigate.

  • The latent space is a fascinating concept. Vox does a great job of explaining more about it here. Highly recommend. Cool wikipedia page too here.

  • The concept of the latent space explains a lot about how AI works and what its restrictions are. For example, it explains why the AI doesn’t know what letters are. Just like it doesn’t know what a person is or any object is. It only knows what pixels tend to be located where, based on connections it has made with keywords associated with images that are learned through training. Stable Diffusion XL will supposedly be able to do letters, so that’s all coming soon.

  • Fun fact: you will NEVER generate the same image twice. Ever.

  • How did text to image start? In image captioning, you have an image, and you ask a machine to describe it, to label it with text. Image to text. Easy. What if you flipped that around? You start with a text, and you label it with an image. That’s how text to image started.

2. 🎥 Watch: An AI artist explains his workflow

  • This video is great for highlighting how clumsy imaging AI still is in a lot of ways.

  • For example, let’s say I’m a fashion brand, and I want to have a photo of a model lying down in the desert, wearing my brand’s dress and shoes. The AI can get you 80% of the way there, but not 100% (yet). In many cases, if you need the 100%, you’ll be better off with a real photographer and a real model.

  • Same with something like illustrations. If you have an exact vision in mind, you’re better off in illustrator. If you’re flexible, then AI is great.

  • For storyboarding, prototyping and inspiration (not minor things), imaging AI is perfect.

  • How best to interface with a machine? Text might also not be the best way to navigate in the latent space. So let’s see how that develops.

  • Of course, work that is easily copied and highly execution based, seems to be going away at a much faster rate than anticipates. Especially in China.

🛠️Tool: Bing AI opened for public testing, added restaurant bookings, chat history and video answers

  • You can now use Bing AI to complete tasks. So if a search result recommends a restaurant, it can find a reservation time that works for you and book it all in the chat interface. Pretty neat.

  • “We’re introducing richer, more visual answers including charts and graphs and updated formatting of answers, helping you find the information you seek more easily,” says Bing.

  • Note: it’s still only available by using the Edge browser, which you can get here. It’s great, it’s free, so go try it.

Stay hydrated, eat vegetables. I wish you nice human and non-robot things like sunny skies and solid sleep.

Cheers,
Tanya

PS.: If you enjoyed this, please share as it’ll help other people find the newsletter. If you refer someone using your personal referral link ({{rp_refer_url}}), I’ll give you access to my AI tools database. I sorted +100 tools by relevance for consultants, PR professionals, sales, marketing professionals. Subscribe (it’s free). ⚡

Previous editions:

📧 You’re still here? If you stayed to the end you're probably cool and I'd like to connect. Give me a book recommendation or just say hi on Twitter or LinkedIn.