MoE
NVIDIA · 2026-03
Nemotron 3 Super 120B-A12B
MoE decoder architecture with Mostly Mamba-2 + a few GQA layers attention mechanism.
Nemotron 3 Super 120B-A12B decoder block architecture: Attention: Mostly Mamba-2 + a few GQA layers. Normalization: RMSNorm. FFN: Mixture of Experts (12B active parameters). Position encoding: RoPE. Scale: 120B, 1M context, 88 layers. Decoder type: MoE.
Mostly Mamba-2 + a few GQA layers·MoE · 12B active
12B active / 120B total|1M context|Mostly Mamba-2 + a few GQA layers|MoE
Architecture Specifications
Parameters12B active / 120B total
Context Window1M
Decoder TypeMoE
AttentionMostly Mamba-2 + a few GQA layers
Active Parameters12B
Layers88
Hidden Size4,096
Vocabulary Size131K
Release Date2026-03
CategoryHybrid Architecture
OrganizationNVIDIA
Key Features
Grouped Query AttentionExpert routingLayer mix: 8 GQA + 40 Mamba-2 + 40 MoEKV cache: 8 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.