Qwen2.5-Math-7B-GPG / README.md

xiao23451

Update README.md

341f780 verified 8 months ago

preview code

raw

history blame contribute delete

711 Bytes

metadata

license: apache-2.0
base_model:
  - Qwen/Qwen2.5-Math-7B

Model ID

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

https://arxiv.org/abs/2504.02546

Model Details

The RL model (GPG-7B in paper) trained on the simple1r_qwen_level3to5 dataset based on GPG, using Qwen2.5-Math-7B as the baseline model.

Attention!

Due to changes in environment and devices, test results may fluctuate. Specifically, when tested on an NPU, the average accuracy of five datasets (AIME24, AMC23, MATH-500, Minerva and OlympiadBench) is 57.7. However, when tested on an H20 GPU, the average accuracy drops from 57.7 to 55.3. These fluctuations are entirely within an acceptable range.