Dense
Google · 2026-04
Gemma 4 31B
Dense decoder architecture with GQA + QK-Norm + SWA attention mechanism.
Gemma 4 31B decoder block architecture: Attention: GQA + QK-Norm + SWA with QK-Norm with Sliding Window Attention. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 30.7B, 256K context, 64 layers. Decoder type: Dense.
GQA + QK-Norm + SWA·SwiGLU
30.7B|256K context|GQA + QK-Norm + SWA|Dense
Architecture Specifications
Parameters30.7B
Context Window256K
Decoder TypeDense
AttentionGQA + QK-Norm + SWA
Vocabulary Size262K
Release Date2026-04
CategoryLong Context
OrganizationGoogle
Key Features
Grouped Query AttentionSliding Window AttentionQK normalizationLayer mix: 50 sliding-window + 10 globalKV cache: 840 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.