|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-7B |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
- sft |
|
|
- ultrafeedback |
|
|
datasets: |
|
|
- trl-lib/tldr |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Qwen2.5-7B Fine-tuned on tldr |
|
|
|
|
|
This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the [trl-lib/tldr](https://huggingface.co/datasets/trl-lib/tldr) dataset. |
|
|
|
|
|
## Training Results |
|
|
|
|
|
 |
|
|
|
|
|
### Training Statistics |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Total Steps | 1312 | |
|
|
| Final Training Loss | 2.2743 | |
|
|
| Min Training Loss | 2.2423 | |
|
|
| Training Runtime | 1363.49 seconds | |
|
|
| Samples/Second | 61.55 | |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Base Model | Qwen/Qwen2.5-7B | |
|
|
| Dataset | trl-lib/tldr | |
|
|
| Number of Epochs | 1.0 | |
|
|
| Per Device Batch Size | 16 | |
|
|
| Gradient Accumulation Steps | 1 | |
|
|
| Total Batch Size | 64 (4 GPUs) | |
|
|
| Learning Rate | 2e-05 | |
|
|
| LR Scheduler | cosine | |
|
|
| Warmup Ratio | 0.1 | |
|
|
| Max Sequence Length | 512 | |
|
|
| Optimizer | adamw_torch_fused | |
|
|
| Mixed Precision | BF16 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "activeDap/Qwen2.5-7B_tldr" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
|
|
# Format input with prompt template |
|
|
prompt = "What is machine learning?\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
|
|
# Generate response |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Training Framework |
|
|
|
|
|
- **Library:** Transformers + TRL |
|
|
- **Training Type:** Supervised Fine-Tuning (SFT) |
|
|
- **Format:** Prompt-completion with Assistant-only loss |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original base model and dataset: |
|
|
|
|
|
```bibtex |
|
|
@misc{ultrafeedback2023, |
|
|
title={UltraFeedback: Boosting Language Models with High-quality Feedback}, |
|
|
author={Ganqu Cui and Lifan Yuan and Ning Ding and others}, |
|
|
year={2023}, |
|
|
eprint={2310.01377}, |
|
|
archivePrefix={arXiv} |
|
|
} |
|
|
``` |
|
|
|