MiniMax M2.5 230B

MoE decoder architecture with GQA + QK-Norm attention mechanism.

GQA + QK-Norm·MoE · 10B active

10B active / 230B total|197K context|GQA + QK-Norm|MoE

Architecture Specifications

Parameters10B active / 230B total

Context Window197K

Decoder TypeMoE

AttentionGQA + QK-Norm

Active Parameters10B

Layers62

Hidden Size3,072

Vocabulary Size200K

Release Date2026-02

CategoryMixture of Experts

OrganizationUnknown

Grouped Query AttentionQK normalizationLayer mix: 62 GQAKV cache: 248 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.