Hybrid
NVIDIA · 2026-03
Nemotron 3 Nano 4B
Hybrid decoder architecture with GQA + only 4 attention layers attention mechanism.
Nemotron 3 Nano 4B decoder block architecture: Attention: GQA + only 4 attention layers. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 4B, 262K context, 42 layers. Decoder type: Hybrid.
GQA + only 4 attention layers·SwiGLU
4B|262K context|GQA + only 4 attention layers|Hybrid
Architecture Specifications
Parameters4B
Context Window262K
Decoder TypeHybrid
AttentionGQA + only 4 attention layers
Layers42
Hidden Size3,136
Vocabulary Size131K
Release Date2026-03
CategoryHybrid Architecture
OrganizationNVIDIA
Key Features
Grouped Query AttentionLayer mix: 4 GQA + 21 Mamba-2 + 17 FFNKV cache: 16 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.