MoE
Google · 2026-04
Gemma 4 26B-A4B
MoE decoder architecture with GQA + QK-Norm + SWA attention mechanism.
Gemma 4 26B-A4B decoder block architecture: Attention: GQA + QK-Norm + SWA with QK-Norm with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (3.8B active parameters). Position encoding: RoPE. Scale: 25.2B, 256K context, 40 layers. Decoder type: MoE.
GQA + QK-Norm + SWA·MoE · 3.8B active
3.8B active / 25.2B total|256K context|GQA + QK-Norm + SWA|MoE
Architecture Specifications
Parameters3.8B active / 25.2B total
Context Window256K
Decoder TypeMoE
AttentionGQA + QK-Norm + SWA
Active Parameters3.8B
Vocabulary Size262K
Release Date2026-04
CategoryMixture of Experts
OrganizationGoogle
Key Features
Grouped Query AttentionSliding Window AttentionQK normalizationExpert routingLayer mix: 25 sliding-window + 5 globalKV cache: 210 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.