Skip to content
MoE
Unknown · 2026-02

Step 3.5 Flash 196B

MoE decoder architecture with GQA + 3:1 SWA attention attention mechanism.

Step 3.5 Flash 196B decoder block architecture: Attention: GQA + 3:1 SWA attention with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (11B active parameters). Position encoding: RoPE. Scale: 196B, 262K context, 45 layers. Decoder type: MoE.

GQA + 3:1 SWA attention·MoE · 11B active
11B active / 196B total|262K context|GQA + 3:1 SWA attention|MoE

Architecture Specifications

Parameters11B active / 196B total
Context Window262K
Decoder TypeMoE
AttentionGQA + 3:1 SWA attention
Active Parameters11B
Layers45
Hidden Size4,096
Vocabulary Size129K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown

Key Features

Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 globalKV cache: 192 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.