Sarvam 105B

MoE decoder architecture with MLA + KV LayerNorm + NoPE + RoPE attention mechanism.

MLA + KV LayerNorm + NoPE + RoPE·MoE · 10.3B active

10.3B active / 105B total|131K context|MLA + KV LayerNorm + NoPE + RoPE|MoE

Architecture Specifications

Parameters10.3B active / 105B total

Context Window131K

Decoder TypeMoE

AttentionMLA + KV LayerNorm + NoPE + RoPE

Active Parameters10.3B

Layers32

Hidden Size4,096

Vocabulary Size262K

Release Date2026-03

CategoryMixture of Experts

OrganizationUnknown

Multi-head Latent AttentionExpert routingLayer mix: 32 MLAKV cache: 36 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.