Google introduces Whisk: an AI image generator that remixes your pictures

Google has launched Whisk, a novel AI image generator that utilizes images as prompts instead of relying on text descriptions. This experimental tool offers a unique approach to AI image creation, allowing users to visually guide the generation process.

With Whisk, users can provide images to suggest the subject, scene, and style of their desired output. Multiple images can be used for each of these elements, providing nuanced guidance to the AI. For those who may not have specific images in mind, Whisk offers a feature that automatically generates sample images as prompts. While these samples appear to be AI-generated themselves, they serve as a starting point for users to explore and refine.

After receiving the image prompts, Whisk generates a series of images, each accompanied by a text prompt that describes its key features. Users can then favorite, download, or further refine the generated images by modifying the text prompts or providing additional visual cues.

Google emphasizes that Whisk is intended for “rapid visual exploration” rather than creating highly polished images. The tool acknowledges that the AI may not always perfectly capture the user’s intent, which is why it provides options for refining and iterating on the generated results.

Whisk is powered by the latest iteration of Google’s Imagen 3 image generation model. This updated model enhances the tool’s ability to interpret visual prompts and generate diverse outputs.

In addition to Whisk, Google also introduced Veo 2, the next generation of its video generation model. Veo 2 boasts an improved understanding of cinematography and a reduced tendency to hallucinate unrealistic details. Veo 2 will initially be available through Google’s VideoFX platform and will later be integrated into YouTube Shorts and other products next year.