MoE
Unknown · 2026-02
Ling 2.5 1T
MoE decoder architecture with Lightning Attention plus MLA attention mechanism.
Ling 2.5 1T decoder block architecture: Attention: Lightning Attention plus MLA. Normalization: RMSNorm. FFN: Mixture of Experts (63B active parameters). Position encoding: RoPE. Scale: 1T, 256K context, 80 layers. Decoder type: MoE.
Lightning Attention plus MLA·MoE · 63B active
63B active / 1T total|256K context|Lightning Attention plus MLA|MoE
Architecture Specifications
Parameters63B active / 1T total
Context Window256K
Decoder TypeMoE
AttentionLightning Attention plus MLA
Active Parameters63B
Layers80
Hidden Size8,192
Vocabulary Size157K
Release Date2026-02
CategoryMixture of Experts
OrganizationUnknown
Key Features
Multi-head Latent AttentionLayer mix: 10 MLA + 70 Lightning AttentionKV cache: 11.2 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.