Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke_52k_cotif-ood-v7 Text Generation • 8B • Updated Aug 16 • 6
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke_52k_cotif-v6-mv2 Text Generation • 8B • Updated Aug 16 • 6
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke_openthoughts-llama3-hehe-ta_ps_ct-v2 Text Generation • 8B • Updated Aug 26 • 6
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-17k-llama3-hehe-ta_and_ps Text Generation • 8B • Updated Aug 16 • 7
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-35k_all_cotif-w_partial_soln Text Generation • 8B • Updated Aug 16 • 6
Harsh1729/R1-8B-SFT-cotroller_dataset-bespoke-52k_all_cotif-v6-w_partial_soln-w_change_of_thgt Text Generation • 8B • Updated Aug 16 • 6
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-52k_all_cotif-w_partial_soln-w_change_of_thgt Text Generation • 8B • Updated Aug 16 • 7
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_no_kl Text Generation • 8B • Updated Aug 20 • 8
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-17k-llama3-plan_generation-train Text Generation • 1B • Updated Aug 26 • 5
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-17k-llama3-uncertainty_management-train Text Generation • 1B • Updated Aug 26 • 7
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke-17k-llama3-active_computation-train Text Generation • 1B • Updated Aug 27 • 7
skyai798/STAR-1_DeepSeek-R1-Distill-Llama-8B_sft-complete-dpo Text Generation • 8B • Updated Sep 2 • 7
hlttxdy/STAR-1_DeepSeek-R1-Distill-Llama-8B_dpo_safety_mix Text Generation • 8B • Updated Sep 15 • 44
hlttxdy/DS-8B_dpo_over_refusal_mix_basenum1_safep0.1_epoch1_lr5e-6_beta0.05_ftx0.2 Text Generation • 8B • Updated Sep 21 • 8
jamescallander/DeepSeek-R1-Distill-Llama-8B_w8a8_g128_rk3588.rkllm Text Generation • Updated Oct 9 • 9
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-llama3-8b Text Generation • 8B • Updated Oct 9 • 7
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke_openthoughts3-ta_multiple Text Generation • 1B • Updated Oct 6 • 8
Harsh1729/R1-Distill-Llama-8B-SFT-cotroller_dataset-bespoke_openthoughts3-ta_multiple-v2 Text Generation • 1B • Updated Oct 6 • 6