MoE
Unknown · 2026-02
Step 3.5 Flash 196B
MoE decoder architecture with GQA + 3:1 SWA attention attention mechanism.
Step 3.5 Flash 196B decoder block architecture: Attention: GQA + 3:1 SWA attention with Sliding Window Attention. Normalization: RMSNorm. FFN: Mixture of Experts (11B active parameters). Position encoding: RoPE. Scale: 196B, 262K context, 45 layers. Decoder type: MoE.
GQA + 3:1 SWA attention·MoE · 11B active
11B active / 196B total|262K context|GQA + 3:1 SWA attention|MoE
Architecture Specifications
Parameters11B active / 196B total
Context Window262K
Decoder TypeMoE
AttentionGQA + 3:1 SWA attention
Active Parameters11B
Layers45
Hidden Size4,096
Vocabulary Size129K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown
Key Features
Grouped Query AttentionSliding Window AttentionLayer mix: 36 sliding-window + 12 globalKV cache: 192 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.