GLM-5 is the fifth generation of large language models created by Zhipu AI (Z.ai), a Chinese AI company founded in 2019. It is designed for complex reasoning, coding, creative writing, and agentic intelligence. GLM-5 employs a Mixture of Experts (MoE) architecture, featuring approximately 745 billion parameters, with 44 billion active parameters during inference. It can process context windows of up to 200,000 tokens and uses the DeepSeek Sparse Attention mechanism for efficient processing of long sequences. GLM-5 has demonstrated strong performance in logical reasoning, programming, and agent systems, achieving top scores among open-source models in these areas. It was trained entirely on Huawei Ascend chips, marking a step towards AI infrastructure independence.
GLM-5 is accessible through various channels, including the Z.ai chat platform, API, and local implementation with publicly available model weights. It is available on platforms like Hugging Face and ModelScope under the MIT license, permitting commercial use, adaptation, and further development. As of February 2026, GLM-5 is also accessible via AI Gateway and Atlas Cloud. GLM-5 has a competitive price point; while pricing can vary across platforms, GLM-5 (Reasoning) is priced at $1.00 per 1 million input tokens and $3.20 per 1 million output tokens. Zhipu AI also supports the implementation of GLM-5 on other non-NVIDIA chips, including Moore Threads, Cambricon, Kunlun Chip, MetaX, Enflame, and Hygon.