
Benj Edwards / Ars Technica
Last week, Swiss software engineer Matthias Bühlmann discovered popular image synthesis model Stable Diffusion It can compress existing bitmap images at higher compression ratios with less visual artifacts than JPEG or WebP, although there are important caveats.
Stable Diffusion a AI image synthesis model this typically creates images based on text descriptions (called “prompts”). The AI model learned this ability by studying millions of images taken from the internet. During the training process, the model establishes statistical relationships between images and related words, makes a much smaller representation of important information about each image, and stores them as “weights”, which are mathematical values that represent what the AI image model is. literally knows.
When Stable Diffusion analyzes and “compresses” images into weight form, they reside in what researchers call “hidden space”; this is one way of saying they exist as some kind of fuzzy potential, once decoded they can be converted to images. . With Stable Diffusion 1.4, the weights file is roughly 4GB, but represents information about hundreds of millions of images.

While most people use Stable Diffusion with text prompts, Bühlmann cut the text encoder and instead forced their image into Stable Diffusion’s image encoder process, which takes a low-sensitivity 512 × 512 image and converts it to a higher-precision 64 × 64 hidden image. space representation. At this point, the image resides in a much smaller data size than the original, but can still be expanded (decoded) to a 512×512 image with pretty good results.
While running the tests, Bühlmann found that a new image compressed with Stable Diffusion subjectively looked better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows a photograph of a slide (originally 768 KB) compressed to 5.68 KB using JPEG, 5.71 KB using WebP, and 4.98 KB using Stable Diffusion. The Stable Spread image appears to have more resolved detail and less obvious compression artifacts than compressed in other formats.

Bühlmann’s method currently imposes significant limitations, however: It’s not good with faces or text, and in some cases can actually imagine detailed features in the decoded image that are not available in the source image. (You probably don’t want your image compressor to find details on an image that doesn’t exist.) Also, decoding requires a 4GB Stable Spread weights file and extra decode time.
While this use of Stable Propagation is an unconventional and more fun hack than a practical solution, it could potentially point to a new future use of image synthesis models. Bühlmann’s code could be Found on Google Colab, and more technical details about his experiment Post in Towards AI.