Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.18129

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68

One-RL-to-See-Them-All

One RL to See Them All: Visual Triple Unified Reinforcement Learning. GitHub: https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
One-RL-to-See-Them-All/Orsta-Data-47k

Updated Jun 4, 2025 • 516 • 17
One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 662 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 40
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

about 2 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
Performance Trade-offs of Optimizing Small Language Models for E-Commerce

Paper • 2510.21970 • Published Oct 24, 2025 • 3

One-RL-to-See-Them-All

https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2
One-RL-to-See-Them-All/Orsta-32B-0326

Image-Text-to-Text • 33B • Updated Jun 4, 2025 • 16 • 8
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62

One-RL-to-See-Them-All

https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2
One-RL-to-See-Them-All/Orsta-32B-0326

Image-Text-to-Text • 33B • Updated Jun 4, 2025 • 16 • 8
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 79
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 341
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19, 2025 • 83

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

about 2 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published Mar 20, 2025 • 52
Performance Trade-offs of Optimizing Small Language Models for E-Commerce

Paper • 2510.21970 • Published Oct 24, 2025 • 3

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68

One-RL-to-See-Them-All

https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2
One-RL-to-See-Them-All/Orsta-32B-0326

Image-Text-to-Text • 33B • Updated Jun 4, 2025 • 16 • 8
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62

One-RL-to-See-Them-All

One RL to See Them All: Visual Triple Unified Reinforcement Learning. GitHub: https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
One-RL-to-See-Them-All/Orsta-Data-47k

Updated Jun 4, 2025 • 516 • 17
One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2

One-RL-to-See-Them-All

https://github.com/MiniMax-AI/One-RL-to-See-Them-All

One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 23 • 12
One-RL-to-See-Them-All/Orsta-32B-0321

Image-Text-to-Text • 33B • Updated May 26, 2025 • 16 • 2
One-RL-to-See-Them-All/Orsta-32B-0326

Image-Text-to-Text • 33B • Updated Jun 4, 2025 • 16 • 8
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 662 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 40
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 79
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 341
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19, 2025 • 83

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs