Skip to content
Hybrid
NVIDIA · 2026-03

Nemotron 3 Nano 4B

Hybrid decoder architecture with GQA + only 4 attention layers attention mechanism.

Nemotron 3 Nano 4B decoder block architecture: Attention: GQA + only 4 attention layers. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 4B, 262K context, 42 layers. Decoder type: Hybrid.

GQA + only 4 attention layers·SwiGLU
4B|262K context|GQA + only 4 attention layers|Hybrid

Architecture Specifications

Parameters4B
Context Window262K
Decoder TypeHybrid
AttentionGQA + only 4 attention layers
Layers42
Hidden Size3,136
Vocabulary Size131K
Release Date2026-03
CategoryHybrid Architecture
OrganizationNVIDIA

Key Features

Grouped Query AttentionLayer mix: 4 GQA + 21 Mamba-2 + 17 FFNKV cache: 16 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.