Google Labs has been releasing various new AI experimental tools and they are exciting. Whisk is a new take on making AI-generated images. Currently, it is still painful to make images one desires, whether you are working with prompt-only tools like DALL-E (which is now part of ChatGPT), tools that add more controls like Midjourney, or advanced tools like AUTOMATIC 1111 or ComfyUI. Striking the balance of asking the AI to include as much stuff “in your head” and to do it creatively so one is pleasantly surprised, is hard. This is a great “Human-AI Collaboration” design challenge for the near future, and it seems Google agrees.
In essence, Whisk’s elevator pitch is to “use images as prompts.” Basically, one creates the image by providing a few sample images and a short prompt. Behind the scene, my guess is that Whisk is taking the input images (that you upload), analyzing and parsing each into AI-generated prompts. I did a quick test and you can see what it does at the end.
Steps
First, go to https://labs.google/fx/tools/whisk. I use [Start from Scratch] option, this lets one use 3 images to compose an image. I chose the following randomly:
Subject
Scene
Style
Next, drop the images and (optional) add a simple prompt, and run. That’s it.
Results
Whisk returns 2 images at a time. Here’s the immediate results from the above:
I feel it picks up more of a Pixar vibe than the SpongeBob vibe I was asking for. Here’s the fun part, click on the image and read the prompt it wrote.
So it basically takes the 3 images, wrote a few lines for each, and combined them into a long prompt above. I tried modifying the “cartoon rending” a bit to see what results we might get, specifically, “flat, 80s style,” “children storybook style,” etc.
(Sidenote: it’s funny that the explicit “annoyed” keyword doesn’t really matter a lot, since Wednesday’s input image is annoyed to begin with)
Finally, I changed out the scene to a Eiffel Tower image and got the following.
Tip: ask AI to generate prompts
My takeaway is that Whisk makes it easier for beginners to make images immediately without going through a multi-step prompting process — which is painful (I can attest as I have been spending hours with various tools). To get the result one needs, it’s necessary to tell the AI what’s inside the head in words — but that’s not easy. So instead, Whisk cleverly ask for 3 images so it knows the who / where / how and get a sense of what’s on the user’s mind.
This leads to a tip that could save time: regardless of the tool one wants to use to generate image, perhaps first grab a (reference) image or two and dump it into a LLM like ChatGPT. Ask it to help you write a prompt to generate image. Here’s one test to see what ChatGPT will write instead:
Closing
Whisk is a simple, no-code / no-prompt prototype that help hides the complexity of describing an image “in a thousand words.” Until we have Neuralink-like ability to directly get what a user “sees in their head,” similar innovations are necessary so the image-making process is even more enjoyable.