HeadlinesBriefing favicon HeadlinesBriefing.com

Z.AI Launches GLM-Image Open-Source Model

DEV Community •
×

Z.AI released GLM-Image, the first open-source industrial-grade autoregressive image generation model. This hybrid system combines a 9B parameter autoregressive module with a 7B diffusion decoder, targeting complex text rendering and knowledge-intensive scenarios that challenge traditional diffusion models.

Unlike Stable Diffusion or Midjourney, GLM-Image uses a two-stage process: generating visual tokens autoregressively, then decoding them via diffusion. This architecture excels at rendering accurate text in posters and diagrams, achieving 91.16% word accuracy on English benchmarks and 97.88% for Chinese text.

Production use demands serious hardware—80GB VRAM is recommended for optimal performance. Generation times hover around 64 seconds per 1024x1024 image on an H100. While heavy, the MIT license permits commercial deployment without restrictions.

Developers can access the model through Hugging Face or GitHub, with pipelines for both transformers and SGLang integration. Z.AI is actively working on vLLM-Omni support to speed up inference for broader adoption.