Prompt Recipe

Hyperglot is a text-to-image model in its most basic form. If you give it a text prompt, it will return an image that matches up to the text.

In the latent space, Hyperglot constructs a random tensor. The seed of the random number generator is used to manipulate this tensor. If the seed is set to a specific value, the same random tensor will always be generated. This is the representation of your image in latent space. But at this time it is all noise.

Step 2: The noise predictor U-Net accepts as input the latent noisy image and text prompt, and predicts the noise in latent space (a 4x64x64 tensor).

Step 3: Subtract the latent noise from the latent picture. This will serve as your new latent image.

Step 4: The decoder of the VAE then translates the latent picture back to pixel space. This is the result of utilizing Hyperglot.

Here is the evolution of an image at each sample step:

Prompt: