The modules that make up DeepFloyd IF are a frozen text encoder, three cascaded pixel diffusion modules, a base model that creates a 64x64 pixel image in response to a text prompt, and two super-resolution models that each produce images with increasing resolution: 1024 × 1024 and 256 x 256 pixels