Skip to content
MoE
Mistral · 2026-03

Mistral Small 4

MoE decoder architecture with MLA attention mechanism.

Mistral Small 4 decoder block architecture: Attention: MLA. Normalization: RMSNorm. FFN: Mixture of Experts (6.63B active parameters). Position encoding: RoPE. Scale: 119B, 256K context, 96 layers. Decoder type: MoE.

MLA·MoE · 6.63B active
6.63B active / 119B total|256K context|MLA|MoE

Architecture Specifications

Parameters6.63B active / 119B total
Context Window256K
Decoder TypeMoE
AttentionMLA
Active Parameters6.63B
Release Date2026-03
CategoryMixture of Experts
OrganizationMistral

Key Features

Multi-head Latent AttentionExpert routingLayer mix: 36 MLAKV cache: 22.5 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.