Skip to content
MoE
Unknown · 2026-03

Sarvam 105B

MoE decoder architecture with MLA + KV LayerNorm + NoPE + RoPE attention mechanism.

Sarvam 105B decoder block architecture: Attention: MLA + KV LayerNorm + NoPE + RoPE. Normalization: RMSNorm. FFN: Mixture of Experts (10.3B active parameters). Position encoding: RoPE. Scale: 105B, 131K context, 32 layers. Decoder type: MoE.

MLA + KV LayerNorm + NoPE + RoPE·MoE · 10.3B active
10.3B active / 105B total|131K context|MLA + KV LayerNorm + NoPE + RoPE|MoE

Architecture Specifications

Parameters10.3B active / 105B total
Context Window131K
Decoder TypeMoE
AttentionMLA + KV LayerNorm + NoPE + RoPE
Active Parameters10.3B
Layers32
Hidden Size4,096
Vocabulary Size262K
Release Date2026-03
CategoryMixture of Experts
OrganizationUnknown

Key Features

Multi-head Latent AttentionExpert routingLayer mix: 32 MLAKV cache: 36 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.