Skip to content
MoE
Unknown · 2026-02

MiniMax M2.5 230B

MoE decoder architecture with GQA + QK-Norm attention mechanism.

MiniMax M2.5 230B decoder block architecture: Attention: GQA + QK-Norm with QK-Norm. Normalization: RMSNorm. FFN: Mixture of Experts (10B active parameters). Position encoding: RoPE. Scale: 230B, 197K context, 62 layers. Decoder type: MoE.

GQA + QK-Norm·MoE · 10B active
10B active / 230B total|197K context|GQA + QK-Norm|MoE

Architecture Specifications

Parameters10B active / 230B total
Context Window197K
Decoder TypeMoE
AttentionGQA + QK-Norm
Active Parameters10B
Layers62
Hidden Size3,072
Vocabulary Size200K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown

Key Features

Grouped Query AttentionQK normalizationLayer mix: 62 GQAKV cache: 248 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.