Made with llm-compressor:
import re
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor import oneshot
MODEL_ID = "huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated"
OUTPUT_DIR = "./Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated-ExpertsOnly-W4A16"
# Target only expert MLP linears via regex
expert_pattern = r".*mlp\.experts\.\d+\.(gate_proj|up_proj|down_proj)$"
recipe = [
QuantizationModifier(
scheme="W4A16",
# Regex target: only expert MLP linear layers
targets=[f"re:{expert_pattern}"],
ignore=["lm_head"],
)
]
# Run oneshot in "data-free" mode
quantized_model = oneshot(
model=MODEL_ID,
precision="bf16",
trust_remote_code_model=True,
recipe=recipe,
dataset=None,
num_calibration_samples=0,
quantization_aware_calibration=False,
max_seq_length=1, # irrelevant here, but keeps dataset pipeline trivial
output_dir=OUTPUT_DIR,
save_compressed=True,
)
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for coughmedicine/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated-W4A16
Base model
Qwen/Qwen3-Next-80B-A3B-Instruct