Dense
Cohere · 2026-02
Tiny Aya 3.35B
Dense decoder architecture with GQA + 3:1 SWA attention attention mechanism.
Tiny Aya 3.35B decoder block architecture: Attention: GQA + 3:1 SWA attention with Sliding Window Attention. Normalization: RMSNorm. FFN: SwiGLU. Position encoding: RoPE. Scale: 3.35B, 8K context, 24 layers. Decoder type: Dense.
GQA + 3:1 SWA attention·SwiGLU
3.35B|8K context|GQA + 3:1 SWA attention|Dense
Architecture Specifications
Parameters3.35B
Context Window8K
Decoder TypeDense
AttentionGQA + 3:1 SWA attention
Release Date2026-02
CategoryEfficient & Small
OrganizationCohere
Key Features
Grouped Query AttentionSliding Window AttentionRoPE embeddingsLayer mix: 27 sliding-window + 9 globalKV cache: 72 KiB/token
Enterprise AI platform
Compare, evaluate, and deploy LLM architectures at scale
Colaberry AI provides architecture specifications, benchmark comparisons, and deployment guidance for enterprise AI teams.