Solegon
/

prompt-safety-bert

Text Classification

jailbreak-detection

prompt-injection

content-moderation

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

🛡️ Prompt Safety BERT

A fine-tuned DistilBERT model for safe/unsafe prompt classification.

Downloads last month: 37

Safetensors

Model size

67M params

Tensor type

F32

·

Datasets used to train Solegon/prompt-safety-bert

Evaluation results

F1
self-reported

0.960