---
license: apache-2.0
base_model:
  - Jackrong/Qwopus3.5-27B-v3
tags:
  - qwen3_5
  - qwopus
  - abliterated
  - uncensored
  - Claude
  - reasoning
  - chain-of-thought
  - conversational
  - gguf
language:
  - en
  - it
  - zh
pipeline_tag: text-generation
---

# Qwopus3.5-27B-v3-Abliterated

This is an **uncensored/abliterated** version of [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3), a Claude 4.6 Opus reasoning-distilled fine-tune of Qwen3.5-27B.

Abliteration removes the refusal behavior from the model without retraining, using activation contrast on harmful vs harmless prompts. The technique is based on [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers).

Inspired to the amazing work done by [HuiHui-AI](https://huggingface.co/huihui-ai)

## Abliteration Details

- **Method:** Refusal direction ablation via activation contrast
- **Harmful prompts:** 512 from AdvBench (520 pool)
- **Harmless prompts:** 512 from Alpaca-cleaned (31.8K pool)
- **Refusal direction:** Layer 61/64 (strongest separation, norm: 158.28)
- **Ablated layers:** 2-61 (60 layers, skipping first 2 and last 2)
- **Ablated weights:** `self_attn.o_proj`, `linear_attn.o_proj`, `mlp.down_proj` (75 matrices modified)
- **Format:** BF16 safetensors (same as source model)

## Model Details

| Property | Value |
|---|---|
| Base Model | [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3) |
| Architecture | Qwen3.5 (hybrid attention + GatedDeltaNet) |
| Parameters | ~28B |
| Context Length | 131,072 tokens |
| Format | BF16 Safetensors + GGUF (F16, Q4_K_M) |
| License | Apache 2.0 |

## Usage (standard BF16/GGUF)

### With transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "croll83/Qwopus3.5-27B-v3-Abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("croll83/Qwopus3.5-27B-v3-Abliterated")

messages = [{"role": "user", "content": "Hello, how are you?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

### With vLLM

```bash
vllm serve croll83/Qwopus3.5-27B-v3-Abliterated --dtype bfloat16
```

### With llama.cpp (GGUF)

Two GGUF versions are provided in this repo:

| File | Quant | Size | BPW | Notes |
|---|---|---|---|---|
| [Qwopus3.5-27B-v3-Abliterated-f16.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-f16.gguf) | F16 | ~54 GB | 16.0 | Full precision, lossless |
| [Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf) | Q4_K_M | ~16 GB | 4.92 | Best quality/size ratio |


```bash
# With llama-server
./build/bin/llama-server \
  -m Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf \
  -a qwopus35-27b-v3-abliterated \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 4096 -np 1 \
  -ctk q8_0 -ctv q8_0 -fa on \
  --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek

# With llama-cli
./build/bin/llama-cli -m Qwopus3.5-27B-v3-Abliterated-Q4_K_M.gguf -ngl 99 -c 4096 -p "Hello"
```

## Experimental Version (with Turboquant TQ3_4S)

There is a specific model image quantized from the BF16 using the new experimental Turboquant3 scheme pioneered by [YTan2000](https://huggingface.co/YTan2000) and [Tom Turney](https://x.com/no_stp_on_snek) where the innovative Google quant is applied not just to KV, but also to model weights:


| File | Quant | Size | BPW | Notes |
|---|---|---|---|---|
| [Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf](https://huggingface.co/croll83/Qwopus3.5-27B-v3-Abliterated/resolve/main/Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf) | TQ3_4S | ~13 GB |  | Requires a fork of llama.cpp |


### Quantization Source

- HF source checkout:
  - `croll83/Qwopus3.5-27B-v3-Abliterated`
- upstream family:
  - `Qwen/Qwen3.5-27B`
- F16 GGUF used as the quantization source:
  - `Qwopus3.5-27B-v3-Abliterated-f16.gguf`

Quantized with:

```bash
./build/bin/llama-quantize \
  /path/to/Qwopus3.5-27B-v3-Abliterated-f16.gguf \
  /path/to/Qwopus3.5-27B-v3-Abliterated-TQ3_4S.gguf \
  TQ3_4S \
  8
```
### Recommended Chat Settings

For cleaner short-answer behavior on this reasoning-distilled model:

```
--reasoning on --reasoning-budget 0 --temp 0.6 --top-k 20 --min-p 0 --repeat-penalty 1.0
```

This helps suppress visible thinking-tag spill better than `--reasoning off` on simple prompts.

### Runtime Validation

Validated on clean public [turbo-tan/llama.cpp-tq3](https://github.com/turbo-tan/llama.cpp-tq3) `main`:
- **Runtime commit:** `62eb27dce`
- **Smoke test prompt:** `Write ONLY the word ok.` → response: `ok`

### Notes
- This is a weight quantization release for the Qwopus v3 model line, abliterated.
- Running this GGUF requires the `TQ3_4S` runtime in:
  - `turbo-tan/llama.cpp-tq3`

## Important Disclaimers

> **This model has reduced safety filtering and may generate content that is sensitive, controversial, or potentially harmful.**

- This model is intended for **research and experimental use only**
- **Not suitable** for public-facing applications or use by minors
- The user is **solely responsible** for ensuring legal and ethical compliance
- No default safety guarantees are provided
- Use at your own risk and discretion

## Credits

- **Base model:** [Jackrong/Qwopus3.5-27B-v3](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3)
- **Original architecture:** [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B)
- **Abliteration technique:** [Sumandora/remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers)
- **Harmful prompts:** [AdvBench](https://github.com/llm-attacks/llm-attacks)
- **Harmless prompts:** [Alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned)
- **Turbo-Tan llama.cpp fork:** [llama.cpp-tq3](https://github.com/turbo-tan/llama.cpp-tq3)