Skip to content
MoE
Alibaba · 2026-02

Qwen3.5 397B

MoE decoder architecture with 3:1 Gated DeltaNet + Gated Attn attention mechanism.

Qwen3.5 397B decoder block architecture: Attention: 3:1 Gated DeltaNet + Gated Attn. Normalization: RMSNorm. FFN: Mixture of Experts (17B active parameters). Position encoding: RoPE. Scale: 397B, 262K context, 128 layers. Decoder type: MoE.

3:1 Gated DeltaNet + Gated Attn·MoE · 17B active
17B active / 397B total|262K context|3:1 Gated DeltaNet + Gated Attn|MoE

Architecture Specifications

Parameters17B active / 397B total
Context Window262K
Decoder TypeMoE
Attention3:1 Gated DeltaNet + Gated Attn
Active Parameters17B
Release Date2026-02
CategoryHybrid Architecture
OrganizationAlibaba

Key Features

Expert routingLayer mix: 15 gated attention + 45 DeltaNetKV cache: 30 KiB/token
Enterprise AI platform

Compare, evaluate, and deploy LLM architectures at scale

Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.

Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.