Skip to content
MoE
Zhipu AI · 2026-02

GLM-5 744B

MoE decoder architecture with MLA + DeepSeek Sparse Attention attention mechanism.

GLM-5 744B decoder block architecture: Attention: MLA + DeepSeek Sparse Attention. Normalization: RMSNorm. FFN: Mixture of Experts (40B active parameters). Position encoding: RoPE. Scale: 744B, 203K context, 78 layers. Decoder type: MoE.

MLA + DeepSeek Sparse Attention·MoE · 40B active
40B active / 744B total|203K context|MLA + DeepSeek Sparse Attention|MoE

Architecture Specifications

Parameters40B active / 744B total
Context Window203K
Decoder TypeMoE
AttentionMLA + DeepSeek Sparse Attention
Active Parameters40B
Layers78
Hidden Size6,144
Vocabulary Size155K
Release Date2026-02
CategoryMixture of Experts
OrganizationZhipu AI

Key Features

Multi-head Latent AttentionExpert routingLayer mix: 78 MLAKV cache: 87.8 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.