DeBERTa Secret Detection Model

This model detects secrets in git diff lines.

Model Details

Base Model: microsoft/deberta-base
Task: Binary sequence classification
Labels:
- LABEL_0: Normal
- LABEL_1: Secret

Training

Loss: Weighted cross-entropy
Metric for best model: F1
BF16 training
Gradient checkpointing enabled

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="hypn05/secret-detector-deberta-base"
)

classifier("+ password='secret123'")

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def main():
    model_name = "hypn05/secret-detector-deberta-base"
    
    # 1. Setup Device
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device.upper()}")

    # 2. Load Model with Correct Parameters
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
    
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, 
        dtype=torch.float16 if device == "cuda" else torch.float32,
        # 'eager' is the standard implementation for DeBERTa-v2
        attn_implementation="eager" 
    ).to(device)
    
    # 3. POWERFUL OPTIMIZATION: torch.compile
    # This replaces SDPA by fusing operations into fast CUDA kernels.
    # Note: The first run will be slow while it compiles, but subsequent runs are ⚡
    if device == "cuda":
        try:
            model = torch.compile(model)
            print("Model compiled for maximum performance.")
        except Exception as e:
            print(f"Skipping compilation: {e}")

    model.eval()

    # 4. Batch Input Data
    examples = [
        "password='secret123'",                            # Represents a secret
        "print('Hello, world!')",                          # Not a secret
        "ACCESS_TOKEN_REDACTED_VALUE",                     # Not a secret
        "def calculate_sum(a, b): return a + b",           # Not a secret
        "DATABASE_CONNECTION_STRING_PLACEHOLDER",          # Not a secret
        "The weather is nice today."                       # Not a secret        
    ]

    # 5. Tokenization
    inputs = tokenizer(
        examples, 
        padding=True, 
        truncation=True, 
        return_tensors="pt"
    ).to(device)

    # 6. Inference
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=-1)

    # 7. Map Results
    print("-" * 60)
    for text, pred_idx in zip(examples, predictions):
        # Accessing config from the base model if compiled
        actual_model = model._orig_mod if hasattr(model, '_orig_mod') else model
        label = actual_model.config.id2label[pred_idx.item()]
        
        display_text = (text[:47] + '...') if len(text) > 50 else text
        print(f"[{label:^10}] | {display_text}")

if __name__ == "__main__":
    main()

Intended Use

Production secret detection in git diffs.

Downloads last month: 180

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for hypn05/secret-detector-deberta-base

Base model

microsoft/deberta-v3-base

Finetuned

(544)

this model