DeBERTa Secret Detection Model
This model detects secrets in git diff lines.
Model Details
- Base Model: microsoft/deberta-base
- Task: Binary sequence classification
- Labels:
- LABEL_0: Normal
- LABEL_1: Secret
Training
- Loss: Weighted cross-entropy
- Metric for best model: F1
- BF16 training
- Gradient checkpointing enabled
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="hypn05/secret-detector-deberta-base"
)
classifier("+ password='secret123'")
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
def main():
model_name = "hypn05/secret-detector-deberta-base"
# 1. Setup Device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device.upper()}")
# 2. Load Model with Correct Parameters
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
dtype=torch.float16 if device == "cuda" else torch.float32,
# 'eager' is the standard implementation for DeBERTa-v2
attn_implementation="eager"
).to(device)
# 3. POWERFUL OPTIMIZATION: torch.compile
# This replaces SDPA by fusing operations into fast CUDA kernels.
# Note: The first run will be slow while it compiles, but subsequent runs are âš¡
if device == "cuda":
try:
model = torch.compile(model)
print("Model compiled for maximum performance.")
except Exception as e:
print(f"Skipping compilation: {e}")
model.eval()
# 4. Batch Input Data
examples = [
"password='secret123'", # Represents a secret
"print('Hello, world!')", # Not a secret
"ACCESS_TOKEN_REDACTED_VALUE", # Not a secret
"def calculate_sum(a, b): return a + b", # Not a secret
"DATABASE_CONNECTION_STRING_PLACEHOLDER", # Not a secret
"The weather is nice today." # Not a secret
]
# 5. Tokenization
inputs = tokenizer(
examples,
padding=True,
truncation=True,
return_tensors="pt"
).to(device)
# 6. Inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# 7. Map Results
print("-" * 60)
for text, pred_idx in zip(examples, predictions):
# Accessing config from the base model if compiled
actual_model = model._orig_mod if hasattr(model, '_orig_mod') else model
label = actual_model.config.id2label[pred_idx.item()]
display_text = (text[:47] + '...') if len(text) > 50 else text
print(f"[{label:^10}] | {display_text}")
if __name__ == "__main__":
main()
Intended Use
Production secret detection in git diffs.
- Downloads last month
- 180
Model tree for hypn05/secret-detector-deberta-base
Base model
microsoft/deberta-v3-base