Skip to content
MoE
Google · 2026-04

Gemma 4 26B-A4B

MoE decoder architecture with GQA + QK-Norm + SWA attention mechanism.

Gemma 4 26B-A4B decoder block architecture: Attention: GQA + QK-Norm + SWA with QK-Norm with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (3.8B active parameters). Position encoding: RoPE. Scale: 25.2B, 256K context, 40 layers. Decoder type: MoE.

GQA + QK-Norm + SWA·MoE · 3.8B active
3.8B active / 25.2B total|256K context|GQA + QK-Norm + SWA|MoE

Architecture Specifications

Parameters3.8B active / 25.2B total
Context Window256K
Decoder TypeMoE
AttentionGQA + QK-Norm + SWA
Active Parameters3.8B
Vocabulary Size262K
Release Date2026-04
CategoryMixture of Experts
OrganizationGoogle

Key Features

Grouped Query AttentionSliding Window AttentionQK normalizationExpert routingLayer mix: 25 sliding-window + 5 globalKV cache: 210 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.