Dense
Unknown · 2026-02
Nanbeige 4.1 3B
Dense decoder architecture with GQA attention mechanism.
Nanbeige 4.1 3B decoder block architecture: Attention: GQA. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 3B, 262K context, 32 layers. Decoder type: Dense.
GQA·SwiGLU
3B|262K context|GQA|Dense
Architecture Specifications
Parameters3B
Context Window262K
Decoder TypeDense
AttentionGQA
Layers32
Hidden Size2,560
Vocabulary Size166K
Release Date2026-02
CategoryEfficient & Small
OrganizationUnknown
Key Features
Grouped Query AttentionLayer mix: 32 GQAKV cache: 64 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.