--- pipeline_tag: text-generation license: other license_name: modified-mit license_link: https://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE library_name: vllm tags: - gptq - 4-bit - quantization - minimax - moe - gptq - w4a16 - vllm datasets: - allenai/c4 - allenai/ai2_arc - openai/gsm8k - openai/openai_humaneval - tatsu-lab/alpaca base_model: - ModelCloud/MiniMax-M2-BF16 - MiniMaxAI/MiniMax-M2 --- # MiniMax-M2-GPTQ-Int4 This repository contains a **4-bit quantized version** of the MiniMax-M2 model. ## Quantization Details The quantization was performed using **[GPTQModel](https://github.com/ModelCloud/GPTQModel)** with an experimental modification that **feeds the whole dataset to each expert** to achieve improved quality. **Calibration Dataset:** The dataset used during quantization consists of 1536 samples: c4/en (1024), arc (164), gsm8k (164), humaneval (164), alpaca (20) **Hardware & Performance:** This model is verified to run with Tensor Parallel (TP) on **8x NVIDIA RTX 3090** GPUs with a context window of **192,500 tokens**. ## Quick Start To serve the model using **vLLM**, please use the following branch which includes specific fixes for loading the model: [https://github.com/avtc/vllm/tree/feature/fix-gptq-m2-load-gemini](https://github.com/avtc/vllm/tree/feature/fix-gptq-m2-load-gemini) **Sample run command (8x 3090):** ```bash export VLLM_ATTENTION_BACKEND="FLASHINFER" export TORCH_CUDA_ARCH_LIST="8.6" export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export VLLM_MARLIN_USE_ATOMIC_ADD=1 export SAFETENSORS_FAST_GPU=1 vllm serve avtc/MiniMax-M2-GPTQMODEL-W4A16 \ -tp 8 \ --port 8000 \ --host 0.0.0.0 \ --uvicorn-log-level info \ --trust-remote-code \ --gpu-memory-utilization 0.925 \ --max-num-seqs 1 \ --trust-remote-code \ --dtype=float16 \ --seed 1234 \ --max-model-len 192500 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2_append_think \ --enable-auto-tool-choice \ --enable-sleep-mode \ --compilation-config '{"level": 3, "cudagraph_capture_sizes": [1], "cudagraph_mode": "PIECEWISE"}' ```` **Recommended Sampling Parameters:** ```json { "top_p": 0.95, "temperature": 1.0, "repetition_penalty": 1.05, "top_k": 40, "min_p": 0.0 } ``` For some tasks temperature 0.6 is better. ### Example Output **Prompt:** > Make an html animation of fishes in an aquarium. The aquarium is pretty, the fishes vary in colors and sizes and swim realistically. You can left click to place a piece of fish food in aquarium. Each fish chases a food piece closest to it, trying to eat it. Once there are no more food pieces, fishes resume swimming as usual. **Result:** The model generated a working artifact using Kilo Code in Code mode. [View the Result on JSFiddle](https://jsfiddle.net/728rkbdg/) ### Acknowledgments Special thanks to **GPTQModel** team for the quantization tools and support. ----- ## ✨ Original Model Highlights

Join Our 💬 WeChat | 🧩 Discord community.

MiniMax Agent | ⚡️ API (Now Free for a limited time!) | MCP | MiniMax Website

🤗 Hugging Face | 🐙 GitHub | 🤖️ ModelScope | 📄 License: MIT

# Meet MiniMax-M2 Today, we release and open source MiniMax-M2, a **Mini** model built for **Max** coding & agentic workflows. **MiniMax-M2** redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. ----- ## Highlights **Superior Intelligence**. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. **Its composite score ranks \#1 among open-source models globally**. **Advanced Coding**. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages. **Agent Performance**. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps. **Efficient Design**. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks. ----- ## Why activation size matters By maintaining activations around **10B** , the plan → act → verify loop in the agentic workflow is streamlined, improving responsiveness and reducing compute overhead: - **Faster feedback cycles** in compile-run-test and browse-retrieve-cite chains. - **More concurrent runs** on the same budget for regression suites and multi-seed explorations. - **Simpler capacity planning** with smaller per-request memory and steadier tail latency. In short: **10B activations = responsive agent loops + better unit economics**. ## At a glance If you need frontier-style coding and agents without frontier-scale costs, **MiniMax-M2** hits the sweet spot: fast inference speeds, robust tool-use capabilities, and a deployment-friendly footprint. We look forward to your feedback and to collaborating with developers and researchers to bring the future of intelligent collaboration one step closer. ## Tool Calling Guide Please refer to our [Tool Calling Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/tool_calling_guide.md). # Contact Us Contact us at [model@minimax.io](mailto:model@minimax.io) | [WeChat](https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg).