OLMoE-UProp: Mixture-of-Experts with Uncertainty-Guided Routing
π Overview
OLMoE-UProp is an enhanced version of allenai/OLMoE-1B-7B-0924 with UProp (Uncertainty-guided Probabilistic Routing) for Mixture-of-Experts models. This is a new model architecture that extends OLMoE with advanced routing mechanisms.
Key Features
- β
New Model Type:
olmoe_uprop- automatically recognized by Transformers - β Full Weight Compatibility: Uses original OLMoE-1B-7B weights
- β UProp Routing: Two methods for uncertainty-aware expert selection
- β Drop-in Replacement: Works as standard OLMoE when UProp is disabled
- β Uncertainty Tracking: Monitor routing decisions during inference
Model Architecture
- Parameters: 0.0M hidden size, 16 layers
- Experts: 64 experts, top-8 routing
- Vocabulary: 50,304 tokens
- Context Length: 4,096 tokens
π Quick Start
Installation
pip install transformers torch
Basic Usage (Standard Routing)
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model (UProp disabled by default - works like standard OLMoE)
model = AutoModelForCausalLM.from_pretrained("your-username/olmoe-uprop")
tokenizer = AutoTokenizer.from_pretrained("your-username/olmoe-uprop")
# Generate text
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Enable UProp Routing
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
# Load config and enable UProp
config = AutoConfig.from_pretrained("your-username/olmoe-uprop")
config.use_uprop_routing = "uncertainty_aware" # or "tdp"
config.uprop_uncertainty_threshold = 0.5
# Load model with UProp
model = AutoModelForCausalLM.from_pretrained(
"your-username/olmoe-uprop",
config=config
)
tokenizer = AutoTokenizer.from_pretrained("your-username/olmoe-uprop")
# Generate
inputs = tokenizer("Explain quantum computing:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π UProp Routing Methods
Method A: Uncertainty-Aware Routing
Dynamically adjusts the number of experts based on intrinsic and extrinsic uncertainty.
config.use_uprop_routing = "uncertainty_aware"
config.uprop_uncertainty_threshold = 0.5 # Threshold for adaptation
config.uprop_history_size = 10 # History buffer size
How it works:
Computes Intrinsic Uncertainty (IU) from routing probability entropy
Computes Extrinsic Uncertainty (EU) from routing history divergence
Adaptively selects more experts when uncertainty is high
Method B: TDP (Trajectory-Dependent Probabilistic) Routing
Samples multiple routing trajectories and uses ensemble when uncertainty is high.
config.use_uprop_routing = "tdp"
config.uprop_tdp_num_samples = 5 # Number of trajectory samples
config.uprop_tdp_temperature = 0.8 # Sampling temperature
config.uprop_uncertainty_threshold = 0.5 # Ensemble threshold
How it works:
Samples multiple routing trajectories with temperature
Computes trajectory diversity as uncertainty measure
Uses ensemble of trajectories when uncertainty exceeds threshold
π§ Configuration Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| use_uprop_routing | str | None | Routing method: "uncertainty_aware", "tdp", or None |
| uprop_uncertainty_threshold | float | 0.5 | Uncertainty threshold for adaptation (0.0-1.0) |
| uprop_history_size | int | 10 | History buffer size (uncertainty_aware only) |
| uprop_tdp_num_samples | int | 5 | Number of trajectory samples (tdp only) |
| uprop_tdp_temperature | float | 0.8 | Sampling temperature (tdp only) |
π‘ Advanced Usage
Uncertainty Tracking
# Get uncertainty information during generation
outputs = model(
input_ids=inputs.input_ids,
return_uncertainty_info=True
)
# Access uncertainty stats
if hasattr(outputs, 'uncertainty_info'):
for layer_idx, info in enumerate(outputs.uncertainty_info):
print(f"Layer {layer_idx}: IU={info['IU']:.4f}, EU={info['EU']:.4f}")
Batch Inference
prompts = [
"The capital of France is",
"Machine learning is",
"In the future, we will"
]
inputs = tokenizer(prompts, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=50)
for i, output in enumerate(outputs):
print(f"{i+1}. {tokenizer.decode(output, skip_special_tokens=True)}")
Clear Routing History
# Clear UProp history between different tasks
if hasattr(model, 'model') and hasattr(model.model, 'clear_uprop_history'):
model.model.clear_uprop_history()
π Performance
UProp routing can improve model performance on tasks requiring:
Adaptive reasoning: Complex multi-step problems
Uncertainty handling: Ambiguous or out-of-distribution inputs
Dynamic complexity: Tasks with varying difficulty levels
π Model Conversion
This model was converted from allenai/OLMoE-1B-7B-0924 by:
Loading original OLMoE weights
Creating new olmoe_uprop model type
Adding UProp routing mechanisms
Maintaining full weight compatibility
π Citation
If you use this model, please cite:
@article{olmoe2024,
title={OLMoE: Open Mixture-of-Experts Language Models},
author={Muennighoff, Niklas and others},
journal={arXiv preprint},
year={2024}
}
π License
Apache 2.0 (same as original OLMoE)
π Acknowledgments
Based on allenai/OLMoE
UProp routing implementation
HuggingFace Transformers team
π Links
Original Model: allenai/OLMoE-1B-7B-0924
Documentation: See usage_example.py for more examples
Issues: Report issues on the model repository
Made with β€οΈ by the OLMoE-UProp team
- Downloads last month
- 393
Model tree for sqlinn/uq-OLMoE-1B-7B-0924
Base model
allenai/OLMoE-1B-7B-0924