kashif
/

DeepConf

Transformers

custom_generate

sampling

Model card Files Files and versions

xet

Community

kashif HF Staff commited on Oct 20

Commit

56bd97c

1 Parent(s): e0297b7

two modes

Browse files

Files changed (2) hide show

README.md +155 -0
custom_generate/generate.py +7 -2

README.md CHANGED Viewed

@@ -18,10 +18,14 @@ DeepCONF monitors the confidence of generated tokens and stops generation when c
 ## Parameters
 - `enable_conf` (bool): Whether to enable the DeepCONF strategy. Defaults to `False`.
 - `window_size` (int): Size of the sliding window for confidence calculation. Defaults to `2048`.
 - `threshold` (float): Confidence threshold for early stopping. Defaults to `17.0`.
 - `conf_topk` (int): Number of top tokens to use for confidence calculation from the full vocabulary. Defaults to `20`.
 - `output_confidences` (bool): If `True` and `return_dict_in_generate=True`, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.
 ## Usage
@@ -158,6 +162,157 @@ print(f"Generated: {tokenizer.decode(outputs.sequences[0], skip_special_tokens=T
 - **DeepConf-low** (eta=0.1): Uses 90th percentile threshold → More aggressive early stopping
 - **DeepConf-high** (eta=0.9): Uses 10th percentile threshold → More permissive, allows longer generation
 ## Technical Details
 ### Confidence Calculation

 ## Parameters
 - `enable_conf` (bool): Whether to enable the DeepCONF strategy. Defaults to `False`.
+- `enable_early_stopping` (bool): Whether to apply early stopping during generation (online mode) or just track confidences for post-processing (batch mode). Defaults to `True`.
 - `window_size` (int): Size of the sliding window for confidence calculation. Defaults to `2048`.
 - `threshold` (float): Confidence threshold for early stopping. Defaults to `17.0`.
 - `conf_topk` (int): Number of top tokens to use for confidence calculation from the full vocabulary. Defaults to `20`.
 - `output_confidences` (bool): If `True` and `return_dict_in_generate=True`, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.
+- `deepconf_variant` (str): Optional variant for automatic threshold calibration (`"low"` or `"high"`). Requires `deepconf_warmup_confidences`.
+- `deepconf_warmup_confidences` (list/tensor): Warmup confidence values for threshold calibration. Used with `deepconf_variant`.
+- `deepconf_eta` (float): Optional override for eta value in threshold calculation (defaults: 0.1 for low, 0.9 for high).
 ## Usage
 - **DeepConf-low** (eta=0.1): Uses 90th percentile threshold → More aggressive early stopping
 - **DeepConf-high** (eta=0.9): Uses 10th percentile threshold → More permissive, allows longer generation
+### Two Modes of Operation
+DeepConf supports two modes that match different use cases:
+#### Mode 1: Online Early Stopping (Default)
+This is the default behavior where early stopping happens **during** generation:
+```python
+# Online mode: Stop immediately when confidence drops
+gen_config = GenerationConfig(
+    enable_conf=True,
+    enable_early_stopping=True,  # Default: True (online stopping)
+    threshold=17.0,
+    window_size=2048,
+    max_new_tokens=512,
+)
+outputs = model.generate(**inputs, generation_config=gen_config, custom_generate="kashif/DeepConf")
+```
+**Use cases:**
+- Interactive generation where you want immediate results
+- Real-time applications
+- Single-sequence generation
+- Lower memory usage (no need to store full sequences)
+#### Mode 2: Batch Generation + Post-Processing
+Generate multiple sequences without early stopping, then analyze them afterward:
+```python
+import torch
+# Phase 1: Generate multiple sequences WITHOUT early stopping
+gen_config = GenerationConfig(
+    enable_conf=True,
+    enable_early_stopping=False,  # Disable online stopping
+    output_confidences=True,
+    return_dict_in_generate=True,
+    max_new_tokens=64,
+)
+# Expand inputs for batch generation (e.g., 8 sequences)
+num_sequences = 8
+expanded_input_ids = inputs.input_ids.repeat(num_sequences, 1)
+if 'attention_mask' in inputs and inputs.attention_mask is not None:
+    expanded_attention_mask = inputs.attention_mask.repeat(num_sequences, 1)
+else:
+    expanded_attention_mask = None
+# Generate batch
+outputs = model.generate(
+    input_ids=expanded_input_ids,
+    attention_mask=expanded_attention_mask,
+    generation_config=gen_config,
+    custom_generate="kashif/DeepConf"
+)
+# Phase 2: Post-process to analyze confidence patterns
+from custom_generate.utils import process_batch_results
+results = process_batch_results(
+    outputs,
+    tokenizer,
+    window_size=2048,
+    threshold=17.0
+)
+# Analyze results
+print(f"Generated {results['num_traces']} sequences")
+print(f"Min confidences: {results['min_confs']}")
+for i, trace in enumerate(results['traces']):
+    print(f"\nSequence {i+1}:")
+    print(f"  Text: {trace['text'][:100]}...")
+    print(f"  Min confidence: {trace['min_conf']:.3f}")
+    print(f"  Would stop early: {trace['stopped_early']}")
+    if trace['stopped_early']:
+        print(f"  Stop position: {trace['stop_position']}")
+```
+**Use cases:**
+- Research and experimentation (try different thresholds without regenerating)
+- Batch serving (generate multiple candidates at once)
+- Analysis and voting (like the official implementation)
+- Calibration and threshold tuning
+**Utility Functions:**
+The `custom_generate/utils.py` module provides helper functions:
+- `process_batch_results()`: Analyze batch outputs to detect early stopping positions
+- `analyze_early_stopping()`: Calculate statistics on early stopping behavior
+- `compute_warmup_threshold()`: Derive threshold from warmup confidences
+- `extract_answer()`: Parse LaTeX `\boxed{answer}` patterns
+#### Complete Workflow Example (Like Official DeepConf)
+This demonstrates the full workflow matching the official implementation:
+```python
+# Step 1: Warmup phase - generate multiple sequences
+warmup_config = GenerationConfig(
+    do_sample=True,
+    temperature=0.7,
+    max_new_tokens=64,
+    enable_conf=True,
+    enable_early_stopping=False,  # No stopping during warmup
+    output_confidences=True,
+    return_dict_in_generate=True,
+)
+# Expand for 8 warmup sequences
+num_warmup = 8
+expanded_ids = inputs.input_ids.repeat(num_warmup, 1)
+expanded_mask = inputs.attention_mask.repeat(num_warmup, 1) if 'attention_mask' in inputs else None
+warmup_outputs = model.generate(
+    input_ids=expanded_ids,
+    attention_mask=expanded_mask,
+    generation_config=warmup_config,
+    custom_generate="kashif/DeepConf"
+)
+# Process warmup to get min confidences
+from custom_generate.utils import process_batch_results, compute_warmup_threshold
+warmup_results = process_batch_results(warmup_outputs, tokenizer, window_size=10)
+print(f"Warmup min confidences: {warmup_results['min_confs']}")
+# Step 2: Compute threshold from warmup
+threshold = compute_warmup_threshold(
+    warmup_results['min_confs'],
+    variant="low"  # or "high"
+)
+print(f"Calibrated threshold: {threshold:.3f}")
+# Step 3: Final generation with calibrated threshold
+final_config = GenerationConfig(
+    enable_conf=True,
+    enable_early_stopping=True,  # Online stopping with calibrated threshold
+    threshold=threshold,
+    window_size=10,
+    max_new_tokens=128,
+)
+final_output = model.generate(**inputs, generation_config=final_config, custom_generate="kashif/DeepConf")
+print(tokenizer.decode(final_output.sequences[0], skip_special_tokens=True))
+```
 ## Technical Details
 ### Confidence Calculation

custom_generate/generate.py CHANGED Viewed

@@ -52,6 +52,7 @@ def generate(
     # Get DeepCONF parameters from generation_config or set defaults
     enable_conf = getattr(generation_config, "enable_conf", False)
     window_size = getattr(generation_config, "window_size", 2048)
     threshold = getattr(
         generation_config, "threshold", 17.0
@@ -263,6 +264,10 @@ def generate(
             # Get top-k tokens from full probability distribution
             top_probs, _ = torch.topk(probs[i], k=conf_topk, dim=-1)
             log_probs = torch.log(top_probs)
             # Confidence is negative mean of log probabilities of top-k tokens
             conf = -log_probs.mean().item()
@@ -273,8 +278,8 @@ def generate(
             conf_group_lists[i].append(conf)
             conf_grouped_sums[i] += conf
-            # Apply confidence-based early stopping when window is full
-            if len(conf_group_lists[i]) >= window_size:
                 avg_conf = conf_grouped_sums[i] / len(conf_group_lists[i])
                 if avg_conf < threshold:
                     deepconf_stopping[i] = False

     # Get DeepCONF parameters from generation_config or set defaults
     enable_conf = getattr(generation_config, "enable_conf", False)
+    enable_early_stopping = getattr(generation_config, "enable_early_stopping", True)  # NEW: Allow disabling early stopping
     window_size = getattr(generation_config, "window_size", 2048)
     threshold = getattr(
         generation_config, "threshold", 17.0
             # Get top-k tokens from full probability distribution
             top_probs, _ = torch.topk(probs[i], k=conf_topk, dim=-1)
+            # Add epsilon for numerical stability (prevent log(0) = -inf)
+            # Use 1e-7 for float16 compatibility (float16 min ~6e-8)
+            eps = torch.finfo(top_probs.dtype).eps if top_probs.dtype == torch.float32 else 1e-7
+            top_probs = torch.clamp(top_probs, min=eps)
             log_probs = torch.log(top_probs)
             # Confidence is negative mean of log probabilities of top-k tokens
             conf = -log_probs.mean().item()
             conf_group_lists[i].append(conf)
             conf_grouped_sums[i] += conf
+            # Apply confidence-based early stopping when window is full (only if enabled)
+            if enable_early_stopping and len(conf_group_lists[i]) >= window_size:
                 avg_conf = conf_grouped_sums[i] / len(conf_group_lists[i])
                 if avg_conf < threshold:
                     deepconf_stopping[i] = False