Data generation
updated
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for
Language Models
Paper
•
2402.13064
•
Published
•
50
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
•
2402.10379
•
Published
•
31
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based
Approach
Paper
•
2405.15613
•
Published
•
17
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference
Datasets
Paper
•
2405.18952
•
Published
•
10
MAmmoTH2: Scaling Instructions from the Web
Paper
•
2405.03548
•
Published
•
6
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
71
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Paper
•
2401.12086
•
Published
•
1
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale
Synthetic Personas
Paper
•
2501.15427
•
Published
•
6
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and
Refinement
Paper
•
2501.12273
•
Published
•
14
How to Synthesize Text Data without Model Collapse?
Paper
•
2412.14689
•
Published
•
52
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper
•
2404.07503
•
Published
•
31