MoE
Unknown · 2026-03
Sarvam 105B
MoE decoder architecture with MLA + KV LayerNorm + NoPE + RoPE attention mechanism.
Sarvam 105B decoder block architecture: Attention: MLA + KV LayerNorm + NoPE + RoPE. Normalization: RMSNorm. FFN: Mixture of Experts (10.3B active parameters). Position encoding: RoPE. Scale: 105B, 131K context, 32 layers. Decoder type: MoE.
MLA + KV LayerNorm + NoPE + RoPE·MoE · 10.3B active
10.3B active / 105B total|131K context|MLA + KV LayerNorm + NoPE + RoPE|MoE
Architecture Specifications
Parameters10.3B active / 105B total
Context Window131K
Decoder TypeMoE
AttentionMLA + KV LayerNorm + NoPE + RoPE
Active Parameters10.3B
Layers32
Hidden Size4,096
Vocabulary Size262K
Release Date2026-03
CategoryMixture of Experts
OrganizationUnknown
Key Features
Multi-head Latent AttentionExpert routingLayer mix: 32 MLAKV cache: 36 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.