Comparison with FLUX VAE at 256x256 Resolution

architecture.

Input text prompts are shown below the images and results are generated from generative models trained for 100K steps.


architecture.

Input text prompts are shown below the images and results are generated from generative models trained for 100K steps.