How to benchmark MMMU properly in SGLang?
#49
by JacobChang - opened
For GLM-5/4.7 in SGLang:
Launch the server:
python3 -m sglang.launch_server \
--model /Path/to/zai-org/GLM-4.7 \
--tp 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45
Benchmark:
python /sgl-workspace/sglang/benchmark/mmmu/bench_sglang.py \
--port 30000 --concurrency 900 --parallel 900 \
--temperature 0 \
--max-new-tokens 131072
The reported acc is about 0.55 but the previous model GLM-4.5V (w/ Thinking) achieved 0.754 as recorded in https://mmmu-benchmark.github.io/
There is a huge gap between these numbers. Any ideas?