fine-tune
#52
by m-hasnain-sabqi - opened
- .eval_results/MathArena--aime_2026.yaml +0 -8
- .eval_results/MathArena--hmmt_feb_2026.yaml +0 -8
- .eval_results/swe_bench_verified.yaml +0 -19
- .eval_results/terminal_bench.yaml +0 -11
- .eval_results/terminal_bench_2.yaml +0 -10
- .eval_results/yc-bench.yaml +0 -9
- README.md +3 -16
- chat_template.jinja +2 -2
.eval_results/MathArena--aime_2026.yaml
DELETED
|
@@ -1,8 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: MathArena/aime_2026
|
| 3 |
-
task_id: MathArena/aime_2026
|
| 4 |
-
value: 95.83
|
| 5 |
-
date: '2026-02-18'
|
| 6 |
-
source:
|
| 7 |
-
url: https://matharena.ai/?comp=aime--aime_2026
|
| 8 |
-
name: Official MathArena Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/MathArena--hmmt_feb_2026.yaml
DELETED
|
@@ -1,8 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: MathArena/hmmt_feb_2026
|
| 3 |
-
task_id: MathArena/hmmt_feb_2026
|
| 4 |
-
value: 86.36
|
| 5 |
-
date: '2026-02-23'
|
| 6 |
-
source:
|
| 7 |
-
url: https://matharena.ai/?comp=hmmt--hmmt_feb_2026
|
| 8 |
-
name: Official MathArena Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/swe_bench_verified.yaml
DELETED
|
@@ -1,19 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: SWE-bench/SWE-bench_Verified
|
| 3 |
-
task_id: swe_bench_%_resolved
|
| 4 |
-
value: 72.80
|
| 5 |
-
source:
|
| 6 |
-
url: https://www.swebench.com/
|
| 7 |
-
name: SWE-Bench official evaluation
|
| 8 |
-
user: nielsr
|
| 9 |
-
notes: high reasoning, official
|
| 10 |
-
|
| 11 |
-
- dataset:
|
| 12 |
-
id: SWE-bench/SWE-bench_Verified
|
| 13 |
-
task_id: swe_bench_%_resolved
|
| 14 |
-
value: 77.8
|
| 15 |
-
source:
|
| 16 |
-
url: https://huggingface.co/zai-org/GLM-5/
|
| 17 |
-
name: Model card
|
| 18 |
-
user: nielsr
|
| 19 |
-
notes: Z.ai reported number
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/terminal_bench.yaml
DELETED
|
@@ -1,11 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: harborframework/terminal-bench-2.0
|
| 3 |
-
task_id: terminal_bench
|
| 4 |
-
value: 52.4
|
| 5 |
-
date: '2026-02-23'
|
| 6 |
-
source:
|
| 7 |
-
url: https://www.tbench.ai/leaderboard/terminal-bench/2.0
|
| 8 |
-
name: Terminal-Bench Leaderboard
|
| 9 |
-
user: burtenshaw
|
| 10 |
-
notes: "agent: Terminus 2"
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/terminal_bench_2.yaml
DELETED
|
@@ -1,10 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: harborframework/terminal-bench-2.0
|
| 3 |
-
task_id: terminalbench_2
|
| 4 |
-
value: 52.4
|
| 5 |
-
date: '2026-02-23'
|
| 6 |
-
source:
|
| 7 |
-
url: https://www.tbench.ai/leaderboard/terminal-bench/2.0
|
| 8 |
-
name: Terminal-Bench Leaderboard
|
| 9 |
-
user: SaylorTwift
|
| 10 |
-
notes: "agent: Terminus 2"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.eval_results/yc-bench.yaml
DELETED
|
@@ -1,9 +0,0 @@
|
|
| 1 |
-
- dataset:
|
| 2 |
-
id: collinear-ai/yc-bench
|
| 3 |
-
task_id: medium
|
| 4 |
-
value: 1208190
|
| 5 |
-
date: "2026-03-24"
|
| 6 |
-
source:
|
| 7 |
-
url: https://github.com/collinear-ai/yc-bench
|
| 8 |
-
name: "YC-Bench eval"
|
| 9 |
-
notes: "avg final funds (USD) across seeds 1,2,3. GLM-5 (via OpenRouter z-ai/glm-5)"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
-
- en
|
| 4 |
-
- zh
|
| 5 |
library_name: transformers
|
| 6 |
license: mit
|
| 7 |
pipeline_tag: text-generation
|
|
@@ -22,11 +22,6 @@ pipeline_tag: text-generation
|
|
| 22 |
👉 One click to <a href="https://chat.z.ai">GLM-5</a>.
|
| 23 |
</p>
|
| 24 |
|
| 25 |
-
<p align="center">
|
| 26 |
-
[<a href="https://huggingface.co/papers/2602.15763" target="_blank">Paper</a>]
|
| 27 |
-
[<a href="https://github.com/zai-org/GLM-5" target="_blank">GitHub</a>]
|
| 28 |
-
</p>
|
| 29 |
-
|
| 30 |
## Introduction
|
| 31 |
|
| 32 |
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
|
|
@@ -154,12 +149,4 @@ vLLM, SGLang, KTransformers, and xLLM all support local deployment of GLM-5. A s
|
|
| 154 |
|
| 155 |
## Citation
|
| 156 |
|
| 157 |
-
|
| 158 |
-
@article{glm5team2026glm5,
|
| 159 |
-
title={GLM-5: from Vibe Coding to Agentic Engineering},
|
| 160 |
-
author={GLM-5 Team and Aohan Zeng and Xin Lv and Zhenyu Hou and Zhengxiao Du and Qinkai Zheng and Bin Chen and Da Yin and Chendi Ge and Chengxing Xie and others},
|
| 161 |
-
journal={arXiv preprint arXiv:2602.15763},
|
| 162 |
-
year={2026},
|
| 163 |
-
url={https://huggingface.co/papers/2602.15763}
|
| 164 |
-
}
|
| 165 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
library_name: transformers
|
| 6 |
license: mit
|
| 7 |
pipeline_tag: text-generation
|
|
|
|
| 22 |
👉 One click to <a href="https://chat.z.ai">GLM-5</a>.
|
| 23 |
</p>
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Introduction
|
| 26 |
|
| 27 |
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
|
|
|
|
| 149 |
|
| 150 |
## Citation
|
| 151 |
|
| 152 |
+
Our technical report is coming soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
chat_template.jinja
CHANGED
|
@@ -32,10 +32,10 @@ For each function call, output the function name and arguments within the follow
|
|
| 32 |
{%- set ns = namespace(last_user_index=-1) %}
|
| 33 |
{%- for m in messages %}
|
| 34 |
{%- if m.role == 'user' %}
|
| 35 |
-
{%
|
| 36 |
{%- endif %}
|
| 37 |
{%- endfor %}
|
| 38 |
-
{%
|
| 39 |
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
|
| 40 |
{%- elif m.role == 'assistant' -%}
|
| 41 |
<|assistant|>
|
|
|
|
| 32 |
{%- set ns = namespace(last_user_index=-1) %}
|
| 33 |
{%- for m in messages %}
|
| 34 |
{%- if m.role == 'user' %}
|
| 35 |
+
{% set ns.last_user_index = loop.index0 -%}
|
| 36 |
{%- endif %}
|
| 37 |
{%- endfor %}
|
| 38 |
+
{% for m in messages %}
|
| 39 |
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
|
| 40 |
{%- elif m.role == 'assistant' -%}
|
| 41 |
<|assistant|>
|