---
base_model:
- Qwen/Qwen3-Next-80B-A3B-Thinking
tags:
- text-generation-inference
license: apache-2.0
---


![qwen3-next-thinking](https://cdn-uploads.huggingface.co/production/uploads/68121d80da035a609e569a81/tXHS7ClRRpoA4sZ2qPegd.png)


**Qwen3-Next-REAP-15B-A3B-Thinking** has the following specifications:

- **Type:** Causal Language Models
- **Number of Parameters**: 15B in total and 3B activated
- **Hidden Dimension**: 2048
- **Number of Layers**: 48
- **Hybrid Layout**: 12 * (3 * (Gated DeltaNet -> MoE) -> 1 * (Gated Attention -> MoE))
- **Gated Attention**:
- **Number of Attention Heads**: 16 for Q and 2 for KV
- **Head Dimension**: 256
- **Rotary Position Embedding Dimension**: 64
- **Gated DeltaNet**:  
  **Number of Linear Attention Heads: 32 for V and 16 for QK  
  **Head Dimension: 128
- **Mixture of Experts**:
- **Number of Experts: 96 (uniformly pruned from 512)
- **Number of Activated Experts: 10
- **Number of Shared Experts: 1
- **Context Length**: 262,144 natively and extensible up to 1,010,000 tokens
- **Compression Method**: REAP (Router-weighted Expert Activation Pruning)
- **Compression Ratio**: 81.25% expert pruning
- **Specialized**: Math, Physics, Control Engineering, Scientific Writing

Test video (Q3_K): https://www.bilibili.com/video/BV1T7zjBWEXc/?vd_source=448090107c928cea02cdf07046d02784