Skip to content
MoE
NVIDIA · 2026-03

Nemotron 3 Super 120B-A12B

MoE decoder architecture with Mostly Mamba-2 + a few GQA layers attention mechanism.

Nemotron 3 Super 120B-A12B decoder block architecture: Attention: Mostly Mamba-2 + a few GQA layers. Normalization: RMSNorm. FFN: Mixture of Experts (12B active parameters). Position encoding: RoPE. Scale: 120B, 1M context, 88 layers. Decoder type: MoE.

Mostly Mamba-2 + a few GQA layers·MoE · 12B active
12B active / 120B total|1M context|Mostly Mamba-2 + a few GQA layers|MoE

Architecture Specifications

Parameters12B active / 120B total
Context Window1M
Decoder TypeMoE
AttentionMostly Mamba-2 + a few GQA layers
Active Parameters12B
Layers88
Hidden Size4,096
Vocabulary Size131K
Release Date2026-03
CategoryHybrid Architecture
OrganizationNVIDIA

Key Features

Grouped Query AttentionExpert routingLayer mix: 8 GQA + 40 Mamba-2 + 40 MoEKV cache: 8 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.