arxiv:2510.01268
Jin Zhu
mamba413
AI & ML interests
None yet
Organizations
models 11
mamba413/L2D
Updated • 1
mamba413/Qwen2.5-1.5B-PPO-DR-HH-Seed1
2B • Updated
mamba413/Qwen2.5-1.5B-PPO-BENCH-HH-Seed1
2B • Updated
mamba413/Qwen2.5-1.5B-Instruct-Reward-BENCH-HH-Seed1
2B • Updated • 1
mamba413/Qwen2.5-1.5B-Instruct-Reward-BENCH-HH-Seed0
Updated
mamba413/Qwen2.5-1.5B-Instruct-Reward-DR-HH-Seed0
Updated
mamba413/Qwen2-0.5B-Reward-DR-HH-Seed0
Text Classification • 0.5B • Updated • 2
mamba413/Qwen2.5-1.5B-Reward-DR-IMDB-Seed0
Updated
mamba413/Qwen2.5-1.5B-Reward-DR-SIMU-Seed0
Updated
mamba413/Qwen2-0.5B-Reward-DR-SIMU-Seed0
Text Classification • 0.5B • Updated • 4
datasets 8
mamba413/GenerateText_Qwen2.5-1.5B-Instruct_GRPO_HH_Seed1
Viewer • Updated • 7.06k • 4
mamba413/GenerateText_HH_Seed1
Viewer • Updated • 11.8k • 16
mamba413/GenerateText_HH_Seed1_new
Viewer • Updated • 640 • 35
mamba413/RewardModel-BENCH-HH-Seed1
Viewer • Updated • 64 • 5
mamba413/RewardModel-DR-HH-Seed1
Viewer • Updated • 64 • 5
mamba413/train_data_imdb_simu_valid
Viewer • Updated • 48.1k • 27
mamba413/train_data_imdb_simu
Viewer • Updated • 48.1k • 23
mamba413/train_data_imdb
Viewer • Updated • 2 • 6