- Electric Dreams
- ⚡Your Beginner’s Guide to Generative AI
⚡Your Beginner’s Guide to Generative AI
Illustrated Generative AI Cheat Sheet
Reading time: 3 minutes
If you're trying to figure out the AI craze and how to navigate the conversations surrounding it: welcome to the ride. This is the place to get started.
What are the key concepts? Generative AI, LLM, and Prompt Engineering
Who are the key players? Midjourney, ChatGPT, and Microsoft
What are the key conversations? Future of Work, AGI, and Privacy
His favorite type of pictures are memes
Generative AI refers to a type of AI designed to generate new data that mimics the characteristics of a particular domain, such as music, text, or images. They’ve been fed loads of input data that they can creatively remix as output data a range that approximates infinity.
Think of it as a master painter. Most painters have done some museum meandering. An AI painter has seen and studied almost every single picture ever made in history. Not just paintings from paleolithic to Pollock to pixel art, but billions of pictures scraped from the internet.
Now take that concept and apply it to not just painting, but also text, music, and more.
Icon of a prompt engineer with a yellow background
What is it? Prompt engineering refers to carefully writing text inputs (prompts) for AI algorithms to generate the desired outcome.
Why is this necessary? Feels like typing “icon of a prompt engineer with a yellow background” should be easy, right? Yes and no. A prompt might involve adding the name of an era, painting style, lighting type, and more. It gets complex.
Here’s an example of a prompt:
18th century Passover still-life with ornate plates, boiled egg, matzoh, wine goblets and herbs, tulip and hyacinth in a hand-blown glass, oil on wood, c, in the style of trompe-l'œil illusionistic detail, surrealist: dreamlike imagery, mysterious backdrops, barbizon school --upbeta --v 5.
You see how specific it is? Details like specifying the flowers, surrealist style, dreamlike atmosphere and more. That’s prompt engineering. Here’s the outcome.
He’s learning languages
Large Language Model (LLM)
A large language model (LLM) is a type of AI that processes large data sets to understand, summarize, generate, and predict new content.
In more technical circles, ChatGPT is referred to as an LLM, not just an AI. Saying “ChatGPT is an AI” is like saying “I love music”: you probably do love music, so it’s correct, but it’s a bit vague, and specifying a genre (large language model or lo-fi lounge music) matters.
Midjourney’s sailboat is effectively mid journey
Midjourney is a text-to-image service and one of the most popular generative AI programs.
It’s been fed over a hundred million images, and continues to improve. As newer versions come out, the images become more and more realistic. You've probably already seen some viral images that it has created, such as the pope in a puffer coat.
Access: Try it out here. It used to be free, now it starts at $10/m.
Competitors: Stable Diffusion (popular because it works well with other programs and is almost entirely free), DALL-E (created by OpenAI).
For example, the images in this article were all created using Midjourney.
How it works: I type a prompt, Midjourney gives me four outputs. I can select one of the four to get a high-res version, or I can repeat for four more outputs, ad infinitum, until I find one I like. It can be very time-consuming but not as much as learning graphic design. Icons you see on this page took an average of 30-40 minutes each, after conceptualizing the core idea. Click clack (x100), beautiful.
The prompt above involved a sailboat, since that's Midjourney's logo.
Me calmly and clearly explaining to ChatGPT why I need him to do my job so I can nap
Easy: ChatGPT is a chatbot that knows almost everything. If Siri is a newborn, ChatGPT is its triple PhD-holding older sibling.
Hard: ChatGPT is a language model that was trained on a vast amount of data. It can generate human-like text. It does not have the ability to "know" or comprehend information in the way that humans do. It generates text based on patterns and associations from the +8 billion web pages it was trained on.
GPT stands for Generative Pre-trained Transformer. If you remember that, it should get you some points next time you're at trivia.
How it works: You ask a question, it answers the question. Think of it as that friend who's a great listener but cannot read between the lines. But if you tell your friend exactly what you need, how, where, and when, they'll listen and do precisely as you say.
ChatGPT was created by research organization OpenAI. The organization was founded in 2015 by Sam Altman (former CEO of Reddit and popular startup accelerator guru), Reid Hoffman (guy who started LinkedIn), Elon Musk (popularized EVs and overpaid for Twitter), Peter Thiel (PayPal man, notorious contrarian), and others.
Iconic 90s PC and that background. That background.
Microsoft, creator of PPT and the iconic XP background, is on the AI map predominantly due to its $1 billion investment in OpenAI.
It flexed Bing AI (think of it as ChatGPT specialized in search), and introduced Microsoft 365 Copilot, its AI companion for Office, with a cool demo. It's like Clippy if Clippy used performance-enhancing drugs.
Microsoft powers almost all business applications. They own B2B distribution. Think of that one cave-dwelling friend of yours who still uses Internet Explorer. Thanks to Microsoft's 30-year-old chokehold on business applications, even that person would now also have easy access to an AI app.
Microsoft vs. Google: It's not often that Microsoft wins, and even now, it's not certain it has. Google invented the transformer architecture in 2017, which enabled ChatGPT to be invented. Google has the R&D prowess, but Microsoft was early with its Copilot release and owns B2B distribution. Google's AI, Bard, shows promise but hasn't shown much else so far.
Click, clack, beautiful
Future of Work
I’m not a 3D illustrator, but I can now create 3D illustrations. What does that mean? If we can do text-to-image now, how much longer until we can do proper text-to-video? Could I auto-generate the next season of The Last of Us? What about HBO or Hollywood? What about producers, directors, videographers, etc? And that’s just one industry. Consider ChatGPT and copywriters or consultants.
Maybe we’re doomed. Some say we might be coming close to the end of work as we know it. As soon as the Singularity (ahhh) comes about, it’s over, the AI takes over everything and we go home. Permanent vacation. This camp connects to chatter of universal income and thinks AI bosses (however controversial) might be closer than we think they are. The reckoning.
Maybe it’s fine. Others argue that we’ve always been able to adjust and shift work as digital transformation came about. The market for cabs crashed in favor of Uber. Unemployment is low, the world balances out and all will be well because ultimately automating low-level work is a good thing.
TLDR; Am I losing my job soon? Probably not. But it is very likely that it will shift, as shown by preliminary studies. That shift would be away from execution-oriented tasks, possibly towards more social tasks or others.
Artificial General Intelligence or AGI
A year ago, there was no ChatGPT. Then it came out as version 3 in November, and as of March, we have version 4. What will come in the next version? How much better can it get? Where does it end? Does it end?
Every robot movie you’ve ever seen has the same plot. Robo go loco and beep boop bye world. In old-school sci-fi, this usually was due to something called the singularity. AGI is like the sexier rebrand.
The singularity refers to a hypothetical point in time at which AGI, an AI that can understand everything humans can and possibly even surpass human capabilities, becomes so advanced that it irreversibly changes human civilization. The definition is vague on purpose but generally dystopian. Some definitions reference “dawn of the machine era” (bit dramatic).
Elon and others want to stop AGI. They wrote an open letter to pause development for six months. Some think a pause won’t work because development like this cannot be controlled or stopped.
Others say the crux of the issue is AI alignment, meaning how well the AI does what its designers intend, and say that research needs to go into safety, transparency, and responsible use of AI.
ChatGPT trains on your data. Your ChatGPT is the same as everyone else's, and it learns from all of us (the Borg, anyone?). How does it discern private info from “learning”? Does it adhere to GDPR?
For example, if I give ChatGPT my credit card info and another user asks for an example of credit card information, will it give away what it has 'learned' from me? In this scenario, it normally wouldn’t. OpenAI says it “works to remove personal information from training data where feasible”. But things can fall through and that might not be enough for European lawmakers.
Italy banned ChatGPT for privacy concerns. Germany and other European regulators are watching closely. They might follow suit.
Some say a ban is positive. Consumers should understand ChatGPT is not watertight and lawmakers are watching closely for any leaks.
Others share concerns that denied access to game-changing tools like ChatGPT might disadvantage them compared to the global workforce.
Sci-fi writer William Gibson famously said, "The future is already here. It's just unequally distributed."
What the distribution of AI fluency looks like in the future will hugely impact the distribution of economic power and individual versatility. But you just read all this, so you're one step in the right direction.
Stay hydrated, eat vegetables. I wish you nice human things like sunny springtime skies, and we’ll chat soon.
PS: I hope you enjoyed the images; they took me a long time to make and I tried to make sure there was a consistency and conceptual relevance to the concept and key ideas.
Can you guess what this stands for?