Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Paper • 2410.08067 • Published Oct 10, 2024 • 2
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29, 2024 • 22