GLM-5 744B

MoE decoder architecture with MLA + DeepSeek Sparse Attention attention mechanism.

MLA + DeepSeek Sparse Attention·MoE · 40B active

40B active / 744B total|203K context|MLA + DeepSeek Sparse Attention|MoE

Architecture Specifications

Parameters40B active / 744B total

Context Window203K

Decoder TypeMoE

AttentionMLA + DeepSeek Sparse Attention

Active Parameters40B

Layers78

Hidden Size6,144

Vocabulary Size155K

Release Date2026-02

CategoryMixture of Experts

OrganizationZhipu AI

Multi-head Latent AttentionExpert routingLayer mix: 78 MLAKV cache: 87.8 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.