Nemotron 3 Nano 4B

Hybrid decoder architecture with GQA + only 4 attention layers attention mechanism.

GQA + only 4 attention layers·SwiGLU

4B|262K context|GQA + only 4 attention layers|Hybrid

Architecture Specifications

Parameters4B

Context Window262K

Decoder TypeHybrid

AttentionGQA + only 4 attention layers

Layers42

Hidden Size3,136

Vocabulary Size131K

Release Date2026-03

CategoryHybrid Architecture

OrganizationNVIDIA

Grouped Query AttentionLayer mix: 4 GQA + 21 Mamba-2 + 17 FFNKV cache: 16 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.