Guidance scale
How to use guidance scale in Stable Diffusion.
Last updated
How to use guidance scale in Stable Diffusion.
Last updated
The Guidance Scale, also known as CFG - classifier free guidance is a parameter that controls how much the image generation process follows the text prompt. Increasing this parameter results in images that adhere more closely to the input text.
It's important to note that setting the value to maximum is not always the optimal approach. This is because the generation process can become less diverse and of lower quality as the guidance scale increases.
You may see this for yourself by checking at the sample images below, which displays various generations of "a swimming cat under the water" with different guidance values.
Observe its behavior at different points; when the guidance scale value is set to 1, the text prompt is completely ignored. When the scale is set to its maximum value of 28, on the other hand, the prompts are strictly followed, but with the worst image quality. Typically, the most creative and artistic outcomes are obtained at a CFG scale of 7.
The suitable CFG value is determined by your desired outcomes as well as the complexity of the text prompt. It's always a good idea to have trials with different scales to determine if more creative or strictly adhering to the prompt outcomes are better for your purpose.
The illustrations below are an example of generating images with complex text prompt:
A city in the future, hyperdetailed, volumetric lighting, landscape photography by Marc Adamus, Cinematic lighting, 8k, diminishing scale view
The prompt in this case has more words, and increasing the guidance scale makes the prompt's details more visible in the resulting images. For instance, when the guidance scale is set to 7, the image shows a less developed cityscape with shorter, older buildings and less futuristic architecture.
In contrast, when the guidance scale is set to a higher level, such as 17, the cityscape appears more advanced and technologically sophisticated. The streets are wider, with less pedestrian traffic and more advanced transportation, while the buildings are taller, sleeker, and more modern.
To generate images with more of the prompt's tiny details, it's recommended to start with a higher guidance scale between 12 and 19.