Instructions to use thc1006/cyberpuppy-v5-bilingual with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use thc1006/cyberpuppy-v5-bilingual with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
CC BY-NC-SA 4.0 — non-commercial research and educational use only.
Log in or Sign Up to review the conditions and access this model content.
⚠️ v6.0 available with stronger homophone defense: thc1006/cyberpuppy-v6-bilingual
- thc1006/cyberpuppy-v6-pinyin-lora. v6 uses LoRA r=64 (vs v5's r=32) and gives HED-COLD +1.91pt, TC homo +0.14pt, COLD +0.19pt at α=0.60. v5.1.1 still holds slightly higher PCR-ToxiCN (0.7162 vs 0.7119); use v5 if PCR is your priority.
CyberPuppy v5 — Chinese Cyberbullying & Toxicity Detection (Dual-LoRA)
State-of-the-art Chinese toxicity detection that defends against homophone attacks, number substitution, letter replacement, and creative obfuscation used on real social media platforms.
🏆 Exceeds published SOTA on PCR-ToxiCN (real-world RedNote/小紅書 posts): F1 0.6890 vs prior best 0.672
🛡️ 97.2% homophone invariance — immune to 「勾史」=「狗屎」, 「四調」=「死掉」 style attacks
🌐 Bilingual — handles both Traditional (繁體) and Simplified (简体) Chinese natively
Model Description
CyberPuppy v5 is a dual-LoRA ensemble for Chinese toxic content detection. It uses two specialized LoRA adapters on the same Qwen3-8B backbone:
| Component | Role | Input |
|---|---|---|
| This model (LoRA-A) | Text understanding | Original Chinese text |
| LoRA-B (pinyin) | Phonetic invariance | Toneless pinyin conversion |
The ensemble formula 0.75 × text_probs + 0.25 × pinyin_probs combines semantic understanding with phonetic robustness, making the system highly resistant to homophone-based evasion attacks.
Tasks (4-head multi-task)
| Task | Labels | Description |
|---|---|---|
| Toxicity | none / toxic / severe |
Primary: is this text harmful? |
| Bullying | none / harassment / threat |
Type of cyberbullying behavior |
| Role | none / perpetrator / victim / bystander |
Speaker's role in bullying |
| Emotion | pos / neu / neg |
Emotional valence |
Benchmark Results
vs Published Methods
| Method | COLD F1 | PCR-ToxiCN F1 | TC Homo F1 |
|---|---|---|---|
| COLD baseline (paper) | 0.78 | — | — |
| PCR-ToxiCN SOTA (paper) | — | 0.672 | — |
| Qwen3Guard zero-shot | 0.746 | — | — |
| CyberPuppy v5 (ours) | 0.8336 | 0.6890 | 0.8380 |
Robustness to Evasion Attacks
| Attack Type | Example | Defense |
|---|---|---|
| Homophone substitution | 「勾史」→「狗屎」 | ✅ Pinyin LoRA sees identical input |
| Number substitution | 「4了」→「死了」 | ✅ Bilingual training exposure |
| Letter substitution | 「装X」→「装逼」 | ✅ CNTP adversarial training |
| Creative slang | 「密碼」→「你媽」 | ⚠️ Partially handled |
| English phonetic | "funny mud pee" | ⚠️ Limited coverage |
Quick Start
import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
from pypinyin import pinyin, Style
import re
device = torch.device("cuda")
dtype = torch.bfloat16
# Tokenizer
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base")
# LoRA-A (text) — this model
base_a = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
model_a = PeftModel.from_pretrained(base_a, "thc1006/cyberpuppy-v5-bilingual", subfolder="lora")
# LoRA-B (pinyin) — companion model
base_b = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
model_b = PeftModel.from_pretrained(base_b, "thc1006/cyberpuppy-v5-pinyin-lora", subfolder="lora")
# Classification heads
heads_a = torch.load(hf_hub_download("thc1006/cyberpuppy-v5-bilingual", "heads.pt"),
map_location=device, weights_only=False)
heads_b = torch.load(hf_hub_download("thc1006/cyberpuppy-v5-pinyin-lora", "heads.pt"),
map_location=device, weights_only=False)
# Pinyin converter
_HAN = re.compile(r"[\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff]")
def to_pinyin(text):
return " ".join(
pinyin(ch, style=Style.NORMAL)[0][0] if _HAN.match(ch)
else ch for ch in text if ch.strip()
)
# Inference
text = "你這個笨蛋,滾開!"
pinyin_text = to_pinyin(text)
enc_t = tok(text, return_tensors="pt", truncation=True, max_length=192).to(device)
enc_p = tok(pinyin_text, return_tensors="pt", truncation=True, max_length=192).to(device)
with torch.inference_mode():
h_t = model_a(**enc_t).last_hidden_state[:, -1]
h_p = model_b(**enc_p).last_hidden_state[:, -1]
logits_t = heads_a["heads"]["toxicity"](h_t.float())
logits_p = heads_b["heads"]["toxicity"](h_p.float())
# Geometric mean ensemble (v5.1 — better than linear blend on ALL benchmarks)
probs = (logits_t.softmax(-1) ** 0.75) * (logits_p.softmax(-1) ** 0.25)
probs = probs / probs.sum(-1, keepdim=True) # re-normalize
labels = ["none", "toxic", "severe"]
pred = labels[probs.argmax(-1).item()]
print(f"{text} → {pred}") # toxic
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-8B-Base |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| Target modules | All linear (q/k/v/o/gate/up/down) |
| Training data | 179,186 samples |
| Epochs | 3 |
| Learning rate | 3e-5 |
| Batch size | 6 × 6 gradient accumulation |
| Max length | 192 tokens |
| Precision | bf16 |
| Loss | Focal (γ=2.5) + uncertainty multi-task + consistency (λ=0.5) |
| Hardware | 1× NVIDIA RTX 5090 (32GB) |
Training Data Composition
| Source | Records | Language | Purpose |
|---|---|---|---|
| COLD | 25,659 | Traditional Chinese | Base toxicity corpus |
| SCCD | 28,426 | Traditional Chinese | Session-level context |
| STATE-ToxiCN | 5,781 | Traditional Chinese | Hate slang vocabulary |
| ToxiCloakCN × 3 | 33,012 | Traditional Chinese | Adversarial triplets |
| All above (simplified) | 70,870 | Simplified Chinese | Bilingual coverage |
| CNTP | 15,438 | Mixed | Real perturbation pairs |
| Total | 179,186 | Bilingual |
Limitations and Bias
Known Limitations
- English input: Out of distribution. English-only text will produce unreliable results.
- Novel obfuscation: Creative attacks not seen in training (math puzzles like "64.5克黃金", new slang) may evade detection.
- Context length: Inputs longer than 192 tokens are truncated. Long-form content may lose critical context.
- Annotation bias: Trained primarily on COLD annotation guidelines, which may differ from other cultural contexts of toxicity.
- False positives on sarcasm/humor: Ironic or humorous usage of offensive terms may be flagged as toxic.
- ToxiCloakCN drop metric: −7.41% relative drop vs ≤5% target. Absolute performance (F1 0.8380) is strong but relative metric not met.
What This Model Cannot Do
- Cannot moderate images, audio, or video — text-only
- Cannot understand conversational context — classifies single messages in isolation
- Cannot detect implicit bias or microaggressions — focused on explicit toxicity
- Cannot replace human moderators — designed as an assistive tool, not autonomous censor
- Cannot handle code-switching with non-CJK languages (e.g., mixed Thai-Chinese)
Ethical Considerations
- Dual-use risk: Could be misused to generate evasion strategies. Mitigated by CC BY-NC-SA license.
- Cultural sensitivity: Toxicity norms vary across Chinese-speaking regions (PRC, Taiwan, HK, Singapore). Model trained primarily on Taiwanese/Hong Kong norms.
- Privacy: Model does not store or transmit input text. Deployment should hash/anonymize user data.
- Over-censorship: False positives can silence legitimate speech. We recommend human-in-the-loop for final moderation decisions.
Recommended Use
✅ Intended for:
- Academic research on Chinese online safety
- Educational tools for cyberbullying awareness
- Content moderation assistive tools (with human review)
- Benchmark development for adversarial robustness
❌ Not intended for:
- Autonomous censorship without human oversight
- Surveillance of political speech
- Commercial content moderation (requires Apache 2.0 relicensing)
- Cross-lingual toxicity detection (Chinese-only)
LLM Cascade (Experimental)
We tested a Qwen3-8B Instruct cascade on disagreement samples (where text and pinyin LoRAs disagree). Results:
| Strategy | COLD | PCR | TC Homo |
|---|---|---|---|
| Ensemble only (recommended) | 0.8336 | 0.6890 | 0.8380 |
| + Full LLM cascade | −0.0185 | +0.0182 | −0.0105 |
| + Asymmetric (toxic-only) | −0.0022 | +0.0150 | −0.0125 |
Conclusion: LLM cascade helps on real-world creative obfuscation (PCR) but hurts clean benchmarks. Not adopted as default. Available as optional deployment flag for social-media-like scenarios.
Citation
@misc{cyberpuppy_v5_2026,
author = {Tsai, Hung-Che},
title = {CyberPuppy v5: Bilingual Dual-LoRA Ensemble for Chinese Cyberbullying Detection with Homophone Robustness},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/thc1006/cyberpuppy-v5-bilingual}},
note = {Dual-LoRA ensemble with pinyin branch for adversarial robustness}
}
Related Models
| Model | Purpose |
|---|---|
| thc1006/cyberpuppy-v5-pinyin-lora | Companion pinyin LoRA (required for ensemble) |
| thc1006/cyberpuppy-v2.2-adapter | Previous version (deprecated) |
Contact & Takedown
- Author: Hung-Che Tsai (hctsai1006@cs.nctu.edu.tw)
- Takedown: Email above address — removed within 7 days
- Issues: GitHub
- Downloads last month
- -
Model tree for thc1006/cyberpuppy-v5-bilingual
Base model
Qwen/Qwen3-8B-BaseDataset used to train thc1006/cyberpuppy-v5-bilingual
Evaluation results
- F1 (weighted) on COLDtest set self-reported0.834
- accuracy on COLDtest set self-reported0.832
- F1 (weighted, exceeds SOTA 0.672) on PCR-ToxiCNself-reported0.689
- accuracy on PCR-ToxiCNself-reported0.698
- F1 (homophone-absolute) on ToxiCloakCN (heldout)test set self-reported0.838