--- license: apache-2.0 base_model: - Jackrong/Qwopus3.5-27B-v3 tags: - qwen3_5 - qwopus - abliterated - uncensored - Claude - reasoning - chain-of-thought - conversational - gguf language: - en - it - zh pipeline_tag: text-generation --- # Qwopus3.5-27B-v3-Abliterated This is an **uncensored/abliterated** version of [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3), a Claude 4.6 Opus reasoning-distilled fine-tune of Qwen3.5-27B. Abliteration removes the refusal behavior from the model without retraining, using activation contrast on harmful vs harmless prompts. The technique is based on [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers). Inspired to the amazing work done by [HuiHui-AI](https://huggingface.co/huihui-ai) ## Abliteration Details - **Method:** Refusal direction ablation via activation contrast - **Harmful prompts:** 512 from AdvBench (520 pool) - **Harmless prompts:** 512 from Alpaca-cleaned (31.8K pool) - **Refusal direction:** Layer 61/64 (strongest separation, norm: 158.28) - **Ablated layers:** 2-61 (60 layers, skipping first 2 and last 2) - **Ablated weights:** `self_attn.o_proj`, `linear_attn.o_proj`, `mlp.down_proj` (75 matrices modified) - **Format:** BF16 safetensors (same as source model) ## Model Details | Property | Value | |---|---| | Base Model | [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3) | | Architecture | Qwen3.5 (hybrid attention + GatedDeltaNet) | | Parameters | ~28B | | Context Length | 131,072 tokens | | Format | BF16 Safetensors + GGUF (F16, Q4_K_M) | | License | Apache 2.0 | ## Usage (standard BF16/GGUF) ### With transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "croll83/Qwopus3.5-27B-v3-Abliterated", torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("croll83/Qwopus3.5-27B-v3-Abliterated") messages = [{"role": "user", "content": "Hello, how are you?"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ### With vLLM ```bash vllm serve croll83/Qwopus3.5-27B-v3-Abliterated --dtype bfloat16 ``` ### With llama.cpp (GGUF) Two GGUF versions are provided in this repo: | File | Quant | Size | BPW | Notes | |---|---|---|---|---| | [Qwopus3.5-27B-v3-Abliterated-f16.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-f16.gguf) | F16 | ~54 GB | 16.0 | Full precision, lossless | | [Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf) | Q4_K_M | ~16 GB | 4.92 | Best quality/size ratio | ```bash # With llama-server ./build/bin/llama-server \ -m Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf \ -a qwopus35-27b-v3-abliterated \ --host 127.0.0.1 --port 8080 \ -ngl 99 -c 4096 -np 1 \ -ctk q8_0 -ctv q8_0 -fa on \ --no-warmup --jinja \ --reasoning off --reasoning-budget 0 --reasoning-format deepseek # With llama-cli ./build/bin/llama-cli -m Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf -ngl 99 -c 4096 -p "Hello" ``` ## Experimental Version (with Turboquant TQ3_4S) There is a specific model image quantized from the BF16 using the new experimental Turboquant3 scheme pioneered by [YTan2000](https://huggingface.co/YTan2000) and [Tom Turney](https://x.com/no_stp_on_snek) where the innovative Google quant is applied not just to KV, but also to model weights: | File | Quant | Size | BPW | Notes | |---|---|---|---|---| | [Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf) | TQ3_4S | ~13 GB | | Requires a fork of llama.cpp | ### Quantization Source - HF source checkout: - `croll83/Qwopus3.5-27B-v3-Abliterated` - upstream family: - `Qwen/Qwen3.5-27B` - F16 GGUF used as the quantization source: - `Qwopus3.5-27B-v3-Abliterated-f16.gguf` Quantized with: ```bash ./build/bin/llama-quantize \ /path/to/Qwopus3.5-27B-v3-Abliterated-f16.gguf \ /path/to/Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf \ TQ3_4S \ 8 ``` ### Recommended Chat Settings For cleaner short-answer behavior on this reasoning-distilled model: ``` --reasoning on --reasoning-budget 0 --temp 0.6 --top-k 20 --min-p 0 --repeat-penalty 1.0 ``` This helps suppress visible thinking-tag spill better than `--reasoning off` on simple prompts. ### Runtime Validation Validated on clean public [turbo-tan/llama.cpp-tq3](https://github.com/turbo-tan/llama.cpp-tq3) `main`: - **Runtime commit:** `62eb27dce` - **Smoke test prompt:** `Write ONLY the word ok.` → response: `ok` ### Notes - This is a weight quantization release for the Qwopus v3 model line, abliterated. - Running this GGUF requires the `TQ3_4S` runtime in: - `turbo-tan/llama.cpp-tq3` ## Important Disclaimers > **This model has reduced safety filtering and may generate content that is sensitive, controversial, or potentially harmful.** - This model is intended for **research and experimental use only** - **Not suitable** for public-facing applications or use by minors - The user is **solely responsible** for ensuring legal and ethical compliance - No default safety guarantees are provided - Use at your own risk and discretion ## Credits - **Base model:** [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3) - **Original architecture:** [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) - **Abliteration technique:** [Sumandora/remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) - **Harmful prompts:** [AdvBench](https://github.com/llm-attacks/llm-attacks) - **Harmless prompts:** [Alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) - **Turbo-Tan llama.cpp fork:** [llama.cpp-tq3](https://github.com/turbo-tan/llama.cpp-tq3)