Accelerated Preference Optimization for Large Language Model Alignment Paper • 2410.06293 • Published Oct 8, 2024 • 5
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
DPLM-2: A Multimodal Diffusion Protein Language Model Paper • 2410.13782 • Published Oct 17, 2024 • 22
An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models Paper • 2408.00724 • Published Aug 1, 2024 • 2
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published Oct 3, 2024 • 9
ProteinBench: A Holistic Evaluation of Protein Foundation Models Paper • 2409.06744 • Published Sep 10, 2024 • 8
view post Post 755 We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675) ⭐ code: https://github.com/uclaml/SPPO🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a 🔥 3 3 + Reply
UCLAML/synthetic_data_mistral-7b-instruct-sppo-iter3_score Viewer • Updated Jun 17, 2024 • 20.5k • 55
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision Paper • 2403.09472 • Published Mar 14, 2024 • 1
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1, 2024 • 28