Nemotron 3 Super 120B-A12B

MoE decoder architecture with Mostly Mamba-2 + a few GQA layers attention mechanism.

Mostly Mamba-2 + a few GQA layers·MoE · 12B active

12B active / 120B total|1M context|Mostly Mamba-2 + a few GQA layers|MoE

Architecture Specifications

Parameters12B active / 120B total

Context Window1M

Decoder TypeMoE

AttentionMostly Mamba-2 + a few GQA layers

Active Parameters12B

Layers88

Hidden Size4,096

Vocabulary Size131K

Release Date2026-03

CategoryHybrid Architecture

OrganizationNVIDIA

Grouped Query AttentionExpert routingLayer mix: 8 GQA + 40 Mamba-2 + 40 MoEKV cache: 8 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.