⚡ Why LUCK matters in image generation

+ attend meetings without actually being in them

Reading time: 3 minutes

scroll down for the prompt

Hello hello hello! I’m in Milan for another week and in Barcelona afterwards. Hit me up if you are around! Subscribers might get access to an exclusive cerveza.

This week:

  1. Adobe Photoshop implements AI. What’s next? 🤖

  2. An Otter meeting assistant 🦦

  3. Why AI can’t (& won’t soon) get eyes right (。◕‿◕。)

  4. ChatGPT plugins that are actually useful? 🤔

  5. Why LUCK is so important in image generation 🧸

auto expand with Photoshop’s “generative fill”

1. Adobe Photoshop implements generative AI tools. What’s next? 🤖

  • Adobe, the legacy birthplace of fake imagery, implemented a tool that can add or remove objectives, change backgrounds with contextual accuracy and more.

  • Canva also has these if you want to go and try it for free there.

  • Image editing with AI is still magical the first time you try it, and I highly recommend you do.

  • All this being said, generative AI is basically becoming mainstream.

  • Not because almost every one (Microsoft, Google, Snap (bad rollout), TikTok (soon), and more have their own application. It’s because people’s parents are doing this. It’s within reach for everyone. The Facebook moment is kicking in. We’re reaching peak mainstream.

  • What’s next?

  • Think of it as the Spotify moment. Spotify was once the hot new tech kid on the block with its ai-powered recommendations. Now, we barely think twice. It’s embedded. Nobody’s freaking out about radio losing their job. We barely remember a time before algorithmic music streaming. Spotify’s weekly recommendations still feel like magic to me, and that’s not entirely unlike most generative AI tools.

  • A lot of the tools seem somewhat similar at the moment. We’re converging. Partially because many tools are based on ChatGPT (chatbot) or Stable Diffusion (open source image generation).

  • Interfaces - this is one other avenue for innovation. Currently, we’re communicating via chatbot. But just like how the smartphone brought internet into our pockets, I’m curious what our next form of engagement with AI will look like. Perhaps videochat? Actually, that brings me to the next point.

Otter’s transcriptions

  • Otter joins your Google Meets, Zooms to take meeting notes. At the end of the meeting, you’ll be sent the transcript including screenshots of presentations and more.

  • I typically take pretty intensive meeting notes, breaking down from takeaways to actionables - but I’ve been using Otter for some work meetings and it’s been wonderful. I still take my own notes but much less.

  • It feels different to have an OtterPilot in the room. It’s literally like another person joining into your Zoom call and automating your common intern duties for you: taking notes.

  • It’s shocking how quickly you get used to always having Otter around. Whenever we’re not sure of what someone said last meeting, or a decision that was taken, we can just pop into the Otter notes and double check. Makes for a great workflow and second brain mindset.

  • Pricing: free tier with up to 300 monthly transcription minutes.

  • For $8/m, Otter can even join virtual meetings when you are double-booked. And then still send you transcripts. Kinda genius as it helps to free you up from meetings, without having to miss what was said. It’s like your very own meeting/podcast PA.

  • Verdict: ⭐⭐⭐⭐⭐ that’s a five out of five

3. Why can’t AI get eyes right?

  • This very commonly asked question betrays a wrong base assumption. You are assuming AI knows what “eyes” are.

  • It doesn’t. Actually, an image generator AI like Midjourney doesn’t know what ANYTHING is. It doesn’t have the faintest idea. It cannot reason or think, it can only sort-of mimic what it’s been told. It’s a savant toddler. Read more about how that works here.

  • So on the topic of eyes. On photos, eyes don’t always appear perfectly in the middle. In a profile photo, it looks like your pupils are glued to the side of your eye.

  • That confuses the AI. It doesn’t know that eyes are only glued to the side when you’re capturing someone in profile.

  • It doesn’t “know” when someone’s looking left or right, because data isn’t usually labeled so meticulously.

  • Most training data descriptions will simply say something like “picture of girl with blue eyes”. So the AI mashes up the billions of images (including eyes that are glued to the side, like in a profile posture) that it’s gotten, and then spits it out. Result: eye mangle!

  • TLDR; AI doesn’t know what anything looks like and mixes all possible eye positions up.

4. Fav ChatGPT plug-ins

  • ⚠️ Plus users only. These are only available for ChatGPT plus users ($20 USD / month). My suggestion: try it for a month, and see if the time saved is worth it to you. I can say with a resounding YES that it’s worth it

  • However, the browsing functionality is excellent. Turn it on by going Settings > Beta features > Browse with Bing + Plugins. To use it, hover over GPT-4 and click on “Browse with Bing”.

  • What do I use the browser for?

    • When I’m traveling, I like to give ChatGPT my approx area, fav things to do, and have it plan out weekend trips for me. It takes into account distance, preferences, and can even add in the Google map links to all the locations if you ask it nicely.

    • Prompt: I am in [PLACE] at [TIME]. I like [INTERESTS]. Please create a plan for me this weekend. Give me different options.

  • Let’s be real though - a lot of the plug-ins so far I’d rather... plug... out? They’ve been subpar. Here are the exceptions:

    • Show me: this tools lets allows you to create mindmaps, directly in ChatGPT. I’m very much a visual learner and this has been great

    • Wolfram: helpful for creating graphs and charts.

5. Why LUCK is so important in image generation

  • For those (woefully) unfamiliar with image generation tools: people don’t talk enough about the factor of LUCK that is required for image generation.

  • In Midjourney, every prompt generates 4 images. Then if I’d click regenerate, it would generate another set of four different images.

  • One prompt can generate millions of different images. Whether or not you happen to find one you love in a first or second batch instead of a tenth or eleventh, is a function of luck.

  • Skill comes into play too. That’s because you can fine-tune and narrow prompts down quite a bit. But the skill part has been overplayed (in my opinion), and the luck aspect is extremely underrated. 

  • For example, out of the above four - I was looking for the “atmosphere” of the third photo. It took about 10 regenerations to get there.

  • The prompt I used here:
    joonay the gyoza girl, in the style of mixes realistic and fantastical elements, matte photo, toycore, chinese iconography, berrypunk, colorful cartoon, realistic sculptures --s 750 - @electricdreamer

  • You can try it for yourself! I’d be curious to see your outcomes (DM/mention me if you like @twizzles7 or LinkedIn)

  • You see words like “toycore” and “berrypunk”. These words don’t actually exist. We’ll delve into that next time.

For now - stay hydrated, eat vegetables. I wish you nice human and non-robot things like sunny skies and solid sleep.

Cheers,
Tanya

PS.: I’m switching over to a different posting schedule so I can prioritize quality > quantity. If you enjoyed this, please share on your social platforms or forward to your friends as it’ll help other people find the newsletter. It’s also very encouraging for me to keep writing and see the subscription thing tick up.

Previous editions:

📧 You’re still here? If you stayed to the end you're probably cool and I'd like to connect. Give me a book recommendation or just say hi on Twitter or LinkedIn.