Our new blog post Smaller Models, Smarter Agents 🚀 https://huggingface.co/blog/yanghaojin/greenbit-3-bit-stronger-reasoning DeepSeek’s R1-0528 proved that 8B can reason like 235B. Anthropic showed that multi-agent systems boost performance by 90%. The challenge? Both approaches burn massive compute and tokens. 💡 GreenBitAI cracked the code: We launched the first 3-bit deployable reasoning model — DeepSeek-R1-0528-Qwen3-8B (3.2-bit). ✅ Runs complex multi-agent research tasks (e.g. Pop Mart market analysis) ✅ Executes flawlessly on an Apple M3 laptop in under 5 minutes ✅ 1351 tokens/s prefill, 105 tokens/s decode ✅ Near-FP16 reasoning quality with just 30–40% token usage This is how extreme compression meets collaborative intelligence — making advanced reasoning practical on edge devices.
Please check our recent blog post, "GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing". A cheaper and more efficient SFT scheme for quantized LLMs is provided.
We are happy to share that we have just open-sourced over 200 low-bit LLMs. For the MLX community, we have prepared 2-4 bit versions of mainstream LLMs. You can visit the following collection to access them: GreenBitAI/greenbitai-mlx-llm-6614eb6ceb8da657c2b4ed58.