Made with llm-compressor:

import re
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor import oneshot

MODEL_ID = "huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated"
OUTPUT_DIR = "./Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated-ExpertsOnly-W4A16"

# Target only expert MLP linears via regex

expert_pattern = r".*mlp\.experts\.\d+\.(gate_proj|up_proj|down_proj)$"

recipe = [
    QuantizationModifier(
        scheme="W4A16",
        # Regex target: only expert MLP linear layers
        targets=[f"re:{expert_pattern}"],
        ignore=["lm_head"],
    )
]

# Run oneshot in "data-free" mode

quantized_model = oneshot(
    model=MODEL_ID,
    precision="bf16",
    trust_remote_code_model=True,
    recipe=recipe,
    dataset=None,
    num_calibration_samples=0,
    quantization_aware_calibration=False,
    max_seq_length=1,           # irrelevant here, but keeps dataset pipeline trivial
    output_dir=OUTPUT_DIR,
    save_compressed=True,
)
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for coughmedicine/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated-W4A16