Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

18,433

Full-text search

Active filters: grpo

gyung/lfm2-1.2b-koen-mt-v8-rl-10k-merged

Text Generation • 1B • Updated 5 days ago • 24 • 2

weygu/qwen2.5-3b-graph-extraction

3B • Updated Apr 28, 2025 • 8 • 2

mradermacher/AlRazi0.1-Medical-Thinking-GGUF

3B • Updated Feb 26, 2025 • 11 • 1

ericrisco/gemma-3-4b-reasoning

Any-to-Any • 4B • Updated Mar 13, 2025 • 28 • 4

klei1/bleta-meditor-27b

Text Generation • 27B • Updated Mar 23, 2025 • 37 • 3

klei1/bleta-logjike-27b

Text Generation • Updated Mar 23, 2025 • 23 • 3

alphadl/R1-Distill-0.6B-Qwen-GRPO

Text Generation • 0.6B • Updated Jun 13, 2025 • 7 • 1

Wildstash/business-strategy-grpo-v2

Updated Oct 24, 2025 • 17 • 1

Wildstash/strategic-consultant-for-corporate-strategy

Text Generation • Updated Oct 28, 2025 • 1

0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther

Text Generation • 0.5B • Updated Nov 18, 2025 • 1.11k • 2

Intel/deepmath-v1

Text Generation • 4B • Updated 29 days ago • 186 • 9

gyung/lfm2-1.2b-koen-mt-v8-rl-10k-adapter

Text Generation • Updated 8 days ago • 65 • 1

aquiffoo/neo-3-3B-A400M-Thinking

Text Generation • Updated 6 days ago • 1

aquiffoo/neo-3-1B-A90M-Instruct

Text Generation • Updated 6 days ago • 1

Nhaass/Qwen3-VL-2B-ChartQA-GRPO

Image-to-Text • 2B • Updated 1 day ago • 6 • 1

onuryozcu/llama

Text Generation • 0.1B • Updated Mar 10, 2025 • 9

amiguel/promptTuning

8B • Updated Feb 16, 2025 • 5

sergiopaniego/Qwen2-0.5B-GRPO-test

Updated Oct 3, 2025

Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF

1B • Updated Jan 28, 2025 • 68 • 3

nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora

Updated Jan 28, 2025

sergiopaniego/Qwen2-0.5B-GRPO

Updated Jan 31, 2025

philschmid/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Jan 30, 2025 • 20 • 8

spinech/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Apr 28, 2025 • 6

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 2, 2025 • 5 • 1

yooneo/qwen-0.5b-r1-aha

Updated Jan 31, 2025

yooneo/qwen-1.5b-r1-aha

Updated Jan 31, 2025

spinech/qwen2.5-3b-r1-rearc-stage1

Text Generation • 3B • Updated Apr 28, 2025 • 7

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO

Text Generation • 8B • Updated Feb 3, 2025 • 7 • 1

MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured

Text Generation • 2B • Updated Feb 3, 2025 • 8 • 5

mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF

2B • Updated Feb 3, 2025 • 124 • 2