How to use from
Docker Model Runner
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:
Quick Links

Qwen3.6-27B-Omnimerge-v4-GGUF

GGUF quantizations of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 — the MLP-passthrough variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 → IQ2_XXS) for llama.cpp.

Source model: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 (BF16 weights, model card with full benchmarks and methodology). NOT a quant of clean Qwen/Qwen3.6-27B — these GGUFs contain the v4 merge.

All quants made using imatrix with calibration data v5, the same calibration set bartowski uses for the Qwen3.6 base release — so quality fingerprints are directly comparable to bartowski's Qwen_Qwen3.6-27B-GGUF repo.

Why this merge exists

Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 on the newer Qwen3.6 base, with mlp.{gate,up,down}_proj copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the v4 model card for the full story, scripts, and benchmark methodology.

Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2)

All scored under identical llama.cpp + lm_eval conditions (--reasoning-format deepseek --reasoning-budget 8192 --parallel 2, raw /v1/completions, no chat template).

Benchmark Qwen3.6 base Q6_K (bartowski) Omnimerge-v2 (Qwen3.5 base) Omnimerge-v4-MLP (this) Δ vs base Δ vs v2
HumanEval pass@1 (164q) 84.76% 79.27% 84.76% 0.00 pp +5.49 pp
MBPP pass@1 (500q) — corrected* 57.60% 74.60% 73.40% +15.80 pp −1.20 pp
GPQA Diamond pass@1 (flex) not measured 69.19% (full 198q) ≈ 84.75% (partial 177q‡) ≈ +15.5 pp

* MBPP scores are post-<think>-stripping (lm_eval's raw scorer SyntaxErrors on literal < in exec(prompt+completion+tests)). See the v4 model card for the per-model recovery breakdown. ‡ GPQA crashed on the at-budget reasoning tail (aiohttp lifecycle bug in lm_eval); 192/198 cached, 177 matched, headline expected to land in the 82-86% band.

Available Quantizations

All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. imatrix.dat (used for every quant) is in the repo root for audit and reproduction.

Quantization File size Use case
F16 (full precision) 50.11 GB Conversion source / lossless reference
Q8_0 26.63 GB Highest fidelity, large
Q6_K_L 21.14 GB Q6_K with embed/output at Q8_0
Q6_K 20.57 GB Recommended high tier — eval methodology used this
Q5_K_L 18.64 GB Q5_K_M with embed/output at Q8_0
Q5_K_M 17.91 GB Strong fidelity, balanced
Q5_K_S 17.40 GB Slightly smaller K-mix
Q4_K_L 16.29 GB Q4_K_M with embed/output at Q8_0
Q4_1 15.91 GB Legacy 4-bit, dense
Q4_K_M 15.41 GB Recommended balanced tier for most users
IQ4_NL 14.72 GB Importance-aware 4-bit non-linear
Q4_K_S 14.52 GB K-mix small variant
Q4_0 14.41 GB Legacy 4-bit
IQ4_XS 14.05 GB IQ4 extra-small
Q3_K_XL 13.42 GB Q3_K_L with embed/output at Q8_0
Q3_K_L 13.36 GB 3-bit K-mix large
Q3_K_M 12.39 GB 3-bit K-mix medium
IQ3_M 11.72 GB Importance-aware 3-bit medium
Q3_K_S 11.24 GB 3-bit K-mix small
IQ3_XS 11.15 GB IQ3 extra-small
Q2_K_L 11.13 GB Q2_K with embed/output at Q8_0
IQ3_XXS 10.42 GB IQ3 extra-extra-small
Q2_K 9.98 GB 2-bit K-mix
IQ2_M 9.32 GB Importance-aware 2-bit medium
IQ2_S 8.72 GB IQ2 small
IQ2_XS 8.47 GB IQ2 extra-small
IQ2_XXS 7.85 GB IQ2 extra-extra-small (smallest)

How to Use

With llama.cpp:

# Recommended args for reasoning-tag-emitting models (matches the eval methodology):
llama-server \
    -m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \
    -c 32768 -ngl 99 -t 12 --no-warmup \
    --reasoning-format deepseek --reasoning-budget 8192

Swap Q4_K_M for any tier from the table above. Q6_K matches the methodology used in our published evals; Q4_K_M is the typical "balanced" choice for most users.

For multimodal (vision) inference: the mmproj projector is in bartowski/Qwen_Qwen3.6-27B-GGUF and works with this model unchanged (vision tower is preserved verbatim from the base).

With ollama: use a Modelfile pointing to one of the GGUFs above, or HF direct load.

imatrix.dat

The imatrix.dat (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable.

Reproducing

See scripts/ on the source v4 model repo:

  • dare_ties_merge.py — main merger (auto-detects Qwen3.6 base via output_gate_type and applies MLP-skip)
  • v4_mlp_passthrough.py — post-process: rebuild merged dir with MLP layers from base
  • quantize_gguf.py — the script that built this repo

For dense (non-Gemma-4-MoE) models, pass --exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps).

License

Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources).

Acknowledgements

Downloads last month
21,662
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF

Quantized
(3)
this model