MoE
Unknown · 2026-02
MiniMax M2.5 230B
MoE decoder architecture with GQA + QK-Norm attention mechanism.
MiniMax M2.5 230B decoder block architecture: Attention: GQA + QK-Norm with QK-Norm. Normalization: RMSNorm. FFN: Mixture of Experts (10B active parameters). Position encoding: RoPE. Scale: 230B, 197K context, 62 layers. Decoder type: MoE.
GQA + QK-Norm·MoE · 10B active
10B active / 230B total|197K context|GQA + QK-Norm|MoE
Architecture Specifications
Parameters10B active / 230B total
Context Window197K
Decoder TypeMoE
AttentionGQA + QK-Norm
Active Parameters10B
Layers62
Hidden Size3,072
Vocabulary Size200K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown
Key Features
Grouped Query AttentionQK normalizationLayer mix: 62 GQAKV cache: 248 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.