Step 3.5 Flash 196B

MoE decoder architecture with GQA + 3:1 SWA attention attention mechanism.

GQA + 3:1 SWA attention·MoE · 11B active

11B active / 196B total|262K context|GQA + 3:1 SWA attention|MoE

Architecture Specifications

Parameters11B active / 196B total

Context Window262K

Decoder TypeMoE

AttentionGQA + 3:1 SWA attention

Active Parameters11B

Layers45

Hidden Size4,096

Vocabulary Size129K

Release Date2026-02

CategoryMixture of Experts

OrganizationUnknown

Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 globalKV cache: 192 KiB/token

Enterprise AI platform

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.