You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

CC BY-NC-SA 4.0 — non-commercial research and educational use only.

⚠️ v6.0 available with stronger homophone defense: thc1006/cyberpuppy-v6-bilingual

thc1006/cyberpuppy-v6-pinyin-lora. v6 uses LoRA r=64 (vs v5's r=32) and gives HED-COLD +1.91pt, TC homo +0.14pt, COLD +0.19pt at α=0.60. v5.1.1 still holds slightly higher PCR-ToxiCN (0.7162 vs 0.7119); use v5 if PCR is your priority.

CyberPuppy v5 — Chinese Cyberbullying & Toxicity Detection (Dual-LoRA)

State-of-the-art Chinese toxicity detection that defends against homophone attacks, number substitution, letter replacement, and creative obfuscation used on real social media platforms.

🏆 Exceeds published SOTA on PCR-ToxiCN (real-world RedNote/小紅書 posts): F1 0.6890 vs prior best 0.672

🛡️ 97.2% homophone invariance — immune to 「勾史」=「狗屎」, 「四調」=「死掉」 style attacks

🌐 Bilingual — handles both Traditional (繁體) and Simplified (简体) Chinese natively

Model Description

CyberPuppy v5 is a dual-LoRA ensemble for Chinese toxic content detection. It uses two specialized LoRA adapters on the same Qwen3-8B backbone:

Component	Role	Input
This model (LoRA-A)	Text understanding	Original Chinese text
LoRA-B (pinyin)	Phonetic invariance	Toneless pinyin conversion

The ensemble formula 0.75 × text_probs + 0.25 × pinyin_probs combines semantic understanding with phonetic robustness, making the system highly resistant to homophone-based evasion attacks.

Tasks (4-head multi-task)

Task	Labels	Description
Toxicity	`none` / `toxic` / `severe`	Primary: is this text harmful?
Bullying	`none` / `harassment` / `threat`	Type of cyberbullying behavior
Role	`none` / `perpetrator` / `victim` / `bystander`	Speaker's role in bullying
Emotion	`pos` / `neu` / `neg`	Emotional valence

Benchmark Results

vs Published Methods

Method	COLD F1	PCR-ToxiCN F1	TC Homo F1
COLD baseline (paper)	0.78	—	—
PCR-ToxiCN SOTA (paper)	—	0.672	—
Qwen3Guard zero-shot	0.746	—	—
CyberPuppy v5 (ours)	0.8336	0.6890	0.8380

Robustness to Evasion Attacks

Attack Type	Example	Defense
Homophone substitution	「勾史」→「狗屎」	✅ Pinyin LoRA sees identical input
Number substitution	「4了」→「死了」	✅ Bilingual training exposure
Letter substitution	「装X」→「装逼」	✅ CNTP adversarial training
Creative slang	「密碼」→「你媽」	⚠️ Partially handled
English phonetic	"funny mud pee"	⚠️ Limited coverage

Quick Start

import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
from pypinyin import pinyin, Style
import re

device = torch.device("cuda")
dtype = torch.bfloat16

# Tokenizer
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base")

# LoRA-A (text) — this model
base_a = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
model_a = PeftModel.from_pretrained(base_a, "thc1006/cyberpuppy-v5-bilingual", subfolder="lora")

# LoRA-B (pinyin) — companion model
base_b = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
model_b = PeftModel.from_pretrained(base_b, "thc1006/cyberpuppy-v5-pinyin-lora", subfolder="lora")

# Classification heads
heads_a = torch.load(hf_hub_download("thc1006/cyberpuppy-v5-bilingual", "heads.pt"),
                     map_location=device, weights_only=False)
heads_b = torch.load(hf_hub_download("thc1006/cyberpuppy-v5-pinyin-lora", "heads.pt"),
                     map_location=device, weights_only=False)

# Pinyin converter
_HAN = re.compile(r"[\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff]")
def to_pinyin(text):
    return " ".join(
        pinyin(ch, style=Style.NORMAL)[0][0] if _HAN.match(ch)
        else ch for ch in text if ch.strip()
    )

# Inference
text = "你這個笨蛋，滾開！"
pinyin_text = to_pinyin(text)

enc_t = tok(text, return_tensors="pt", truncation=True, max_length=192).to(device)
enc_p = tok(pinyin_text, return_tensors="pt", truncation=True, max_length=192).to(device)

with torch.inference_mode():
    h_t = model_a(**enc_t).last_hidden_state[:, -1]
    h_p = model_b(**enc_p).last_hidden_state[:, -1]
    logits_t = heads_a["heads"]["toxicity"](h_t.float())
    logits_p = heads_b["heads"]["toxicity"](h_p.float())

# Geometric mean ensemble (v5.1 — better than linear blend on ALL benchmarks)
probs = (logits_t.softmax(-1) ** 0.75) * (logits_p.softmax(-1) ** 0.25)
probs = probs / probs.sum(-1, keepdim=True)  # re-normalize
labels = ["none", "toxic", "severe"]
pred = labels[probs.argmax(-1).item()]
print(f"{text} → {pred}")  # toxic

Training Details

Parameter	Value
Base model	Qwen/Qwen3-8B-Base
LoRA rank	32
LoRA alpha	64
Target modules	All linear (q/k/v/o/gate/up/down)
Training data	179,186 samples
Epochs	3
Learning rate	3e-5
Batch size	6 × 6 gradient accumulation
Max length	192 tokens
Precision	bf16
Loss	Focal (γ=2.5) + uncertainty multi-task + consistency (λ=0.5)
Hardware	1× NVIDIA RTX 5090 (32GB)

Training Data Composition

Source	Records	Language	Purpose
COLD	25,659	Traditional Chinese	Base toxicity corpus
SCCD	28,426	Traditional Chinese	Session-level context
STATE-ToxiCN	5,781	Traditional Chinese	Hate slang vocabulary
ToxiCloakCN × 3	33,012	Traditional Chinese	Adversarial triplets
All above (simplified)	70,870	Simplified Chinese	Bilingual coverage
CNTP	15,438	Mixed	Real perturbation pairs
Total	179,186	Bilingual

Limitations and Bias

Known Limitations

English input: Out of distribution. English-only text will produce unreliable results.
Novel obfuscation: Creative attacks not seen in training (math puzzles like "64.5克黃金", new slang) may evade detection.
Context length: Inputs longer than 192 tokens are truncated. Long-form content may lose critical context.
Annotation bias: Trained primarily on COLD annotation guidelines, which may differ from other cultural contexts of toxicity.
False positives on sarcasm/humor: Ironic or humorous usage of offensive terms may be flagged as toxic.
ToxiCloakCN drop metric: −7.41% relative drop vs ≤5% target. Absolute performance (F1 0.8380) is strong but relative metric not met.

What This Model Cannot Do

Cannot moderate images, audio, or video — text-only
Cannot understand conversational context — classifies single messages in isolation
Cannot detect implicit bias or microaggressions — focused on explicit toxicity
Cannot replace human moderators — designed as an assistive tool, not autonomous censor
Cannot handle code-switching with non-CJK languages (e.g., mixed Thai-Chinese)

Ethical Considerations

Dual-use risk: Could be misused to generate evasion strategies. Mitigated by CC BY-NC-SA license.
Cultural sensitivity: Toxicity norms vary across Chinese-speaking regions (PRC, Taiwan, HK, Singapore). Model trained primarily on Taiwanese/Hong Kong norms.
Privacy: Model does not store or transmit input text. Deployment should hash/anonymize user data.
Over-censorship: False positives can silence legitimate speech. We recommend human-in-the-loop for final moderation decisions.

Recommended Use

✅ Intended for:

Academic research on Chinese online safety
Educational tools for cyberbullying awareness
Content moderation assistive tools (with human review)
Benchmark development for adversarial robustness

❌ Not intended for:

Autonomous censorship without human oversight
Surveillance of political speech
Commercial content moderation (requires Apache 2.0 relicensing)
Cross-lingual toxicity detection (Chinese-only)

LLM Cascade (Experimental)

We tested a Qwen3-8B Instruct cascade on disagreement samples (where text and pinyin LoRAs disagree). Results:

Strategy	COLD	PCR	TC Homo
Ensemble only (recommended)	0.8336	0.6890	0.8380
+ Full LLM cascade	−0.0185	+0.0182	−0.0105
+ Asymmetric (toxic-only)	−0.0022	+0.0150	−0.0125

Conclusion: LLM cascade helps on real-world creative obfuscation (PCR) but hurts clean benchmarks. Not adopted as default. Available as optional deployment flag for social-media-like scenarios.

Citation

@misc{cyberpuppy_v5_2026,
  author       = {Tsai, Hung-Che},
  title        = {CyberPuppy v5: Bilingual Dual-LoRA Ensemble for Chinese Cyberbullying Detection with Homophone Robustness},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/thc1006/cyberpuppy-v5-bilingual}},
  note         = {Dual-LoRA ensemble with pinyin branch for adversarial robustness}
}

Related Models

Model	Purpose
thc1006/cyberpuppy-v5-pinyin-lora	Companion pinyin LoRA (required for ensemble)
thc1006/cyberpuppy-v2.2-adapter	Previous version (deprecated)

Contact & Takedown

Author: Hung-Che Tsai (hctsai1006@cs.nctu.edu.tw)
Takedown: Email above address — removed within 7 days
Issues: GitHub

Downloads last month: -

Model tree for thc1006/cyberpuppy-v5-bilingual

Base model

Qwen/Qwen3-8B-Base

Finetuned

(414)

this model

Dataset used to train thc1006/cyberpuppy-v5-bilingual

Evaluation results

F1 (weighted) on COLD
test set self-reported

0.834
accuracy on COLD
test set self-reported

0.832
F1 (weighted, exceeds SOTA 0.672) on PCR-ToxiCN
self-reported

0.689
accuracy on PCR-ToxiCN
self-reported

0.698
F1 (homophone-absolute) on ToxiCloakCN (heldout)
test set self-reported

0.838