Sunday, October 09, 2022

Unnatural Crops Explained

The developments in Text to Image AIart generated images are amazing. Things are changing, and largely improving in image quality almost weekly. However one thing I had noticed that was staying fairly constant was unnatural looking crops, the weird truncating of the subjects, particularly people. Surely this was not a new artistic trend I had no knowledge of, or perhaps the artist who's work is being used to train these systems had an aversion to conventional composition.

Example of a headless figure based on stable diffusion prompt

Am I an artist now? John Singer Sargent


Turns out there is a simpler explanation (see quote from a NovelAI blog post below). The unnatural crops are a result of the training set being converted to a square format (so the images are the same ratio) and just arbitrarily using the center of the image.

 Aspect Ratio Bucketing

One common issue of existing image generation models is that they are very prone to producing images with unnatural crops. This is due to the fact that these models are trained to produce square images. However, most photos and artworks are not square. However, the model can only work on images of the same size at the same time, and during training, it is common practice to operate on multiple training samples at once to optimize the efficiency of the GPUs used. As a compromise, square images are chosen, and during training, only the center of each image is cropped out and then shown to the image generation model as a training example.

No comments:

Post a Comment