MoE
Alibaba · 2026-02
Qwen3.5 397B
MoE decoder architecture with 3:1 Gated DeltaNet + Gated Attn attention mechanism.
Qwen3.5 397B decoder block architecture: Attention: 3:1 Gated DeltaNet + Gated Attn. Normalization: RMSNorm. FFN: Mixture of Experts (17B active parameters). Position encoding: RoPE. Scale: 397B, 262K context, 128 layers. Decoder type: MoE.
3:1 Gated DeltaNet + Gated Attn·MoE · 17B active
17B active / 397B total|262K context|3:1 Gated DeltaNet + Gated Attn|MoE
Architecture Specifications
Parameters17B active / 397B total
Context Window262K
Decoder TypeMoE
Attention3:1 Gated DeltaNet + Gated Attn
Active Parameters17B
Release Date2026-02
CategoryHybrid Architecture
OrganizationAlibaba
Key Features
Expert routingLayer mix: 15 gated attention + 45 DeltaNetKV cache: 30 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.