MoE
Zhipu AI · 2026-02
GLM-5 744B
MoE decoder architecture with MLA + DeepSeek Sparse Attention attention mechanism.
GLM-5 744B decoder block architecture: Attention: MLA + DeepSeek Sparse Attention. Normalization: RMSNorm. FFN: Mixture of Experts (40B active parameters). Position encoding: RoPE. Scale: 744B, 203K context, 78 layers. Decoder type: MoE.
MLA + DeepSeek Sparse Attention·MoE · 40B active
40B active / 744B total|203K context|MLA + DeepSeek Sparse Attention|MoE
Architecture Specifications
Parameters40B active / 744B total
Context Window203K
Decoder TypeMoE
AttentionMLA + DeepSeek Sparse Attention
Active Parameters40B
Layers78
Hidden Size6,144
Vocabulary Size155K
Release Date2026-02
CategoryMixture of Experts
OrganizationZhipu AI
Key Features
Multi-head Latent AttentionExpert routingLayer mix: 78 MLAKV cache: 87.8 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.