Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Paper • 2501.09695 • Published Jan 16, 2025 • 1
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19, 2025 • 3