MoE
Mistral · 2026-03
Mistral Small 4
MoE decoder architecture with MLA attention mechanism.
Mistral Small 4 decoder block architecture: Attention: MLA. Normalization: RMSNorm. FFN: Mixture of Experts (6.63B active parameters). Position encoding: RoPE. Scale: 119B, 256K context, 96 layers. Decoder type: MoE.
MLA·MoE · 6.63B active
6.63B active / 119B total|256K context|MLA|MoE
Architecture Specifications
Parameters6.63B active / 119B total
Context Window256K
Decoder TypeMoE
AttentionMLA
Active Parameters6.63B
Release Date2026-03
CategoryMixture of Experts
OrganizationMistral
Key Features
Multi-head Latent AttentionExpert routingLayer mix: 36 MLAKV cache: 22.5 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.