Over the last few months, creation of images have become increasingly common through typing in bits of text, but the next iteration of image generation might only take some movement of the mouse.
A new paper has demonstrated the creation and manipulation of images by simply dragging specific points on them. The idea is demonstrated by a paper titled “Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold”, written by researchers from Max Planck Institute of Germany and Google. “Synthesizing visual content that meets users’ needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality,” the paper says. “In this work, we study a powerful yet much less explored way of controlling GANs, that is, to “drag” any points of the image to precisely reach target points in a user-interactive manner,” it continues.
“To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc,” the abstract says.
And the demonstrations are pretty amazing. A user can simply select points on an image and move them around, and the image changes automatically based on the new boundaries. Users can, thus, cause a subject to smile, a cat to shut its eyes, or have a horse splay its legs. More impressively, the results are almost instantaneous, which makes this a simple method of manipulating images.
Drag Your GANs might just be the latest iteration in image creation, a field which has been radically transformed over the last few years. Initially paid tools like DALL-E were able to create images through text prompts. Not long after, open source alternatives like Stable Diffusion emerged, which created an explosion of text-generated photos, especially through technologies like Dreambooth. Midjourney, meanwhile, has been raising the bar on creating life-like images, but currently remains only on Discord. But these tools are going mainstream — Adobe Photoshop has recently integrated generative AI tools withing its software, and users can manipulate images through text prompts straight through Photoshop. And the field doesn’t seem to be done yet — demos like Drag your GAN now allow users to edit images simply by dragging them around with a mouse. It remains to be seen if Drag your GAN is used in a mainstream implementation, but the field of image generation might soon be completely unrecognizable from what it was just a few years ago.