--- base_model: - Qwen/Qwen3-Next-80B-A3B-Thinking tags: - text-generation-inference license: apache-2.0 --- ![qwen3-next-thinking](https://cdn-uploads.huggingface.co/production/uploads/68121d80da035a609e569a81/tXHS7ClRRpoA4sZ2qPegd.png) **Qwen3-Next-REAP-15B-A3B-Thinking** has the following specifications: - **Type:** Causal Language Models - **Number of Parameters**: 15B in total and 3B activated - **Hidden Dimension**: 2048 - **Number of Layers**: 48 - **Hybrid Layout**: 12 * (3 * (Gated DeltaNet -> MoE) -> 1 * (Gated Attention -> MoE)) - **Gated Attention**: - **Number of Attention Heads**: 16 for Q and 2 for KV - **Head Dimension**: 256 - **Rotary Position Embedding Dimension**: 64 - **Gated DeltaNet**: **Number of Linear Attention Heads: 32 for V and 16 for QK **Head Dimension: 128 - **Mixture of Experts**: - **Number of Experts: 96 (uniformly pruned from 512) - **Number of Activated Experts: 10 - **Number of Shared Experts: 1 - **Context Length**: 262,144 natively and extensible up to 1,010,000 tokens - **Compression Method**: REAP (Router-weighted Expert Activation Pruning) - **Compression Ratio**: 81.25% expert pruning - **Specialized**: Math, Physics, Control Engineering, Scientific Writing Test video (Q3_K): https://www.bilibili.com/video/BV1T7zjBWEXc/?vd_source=448090107c928cea02cdf07046d02784