davidsvaughn 's Collections LLM Refs
updated
Large Language Model Alignment: A Survey
Paper
• 2309.15025
• Published • 2
Aligning Large Language Models with Human: A Survey
Paper
• 2307.12966
• Published • 1
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to
RLHF
Paper
• 2310.05344
• Published • 1
LIMA: Less Is More for Alignment
Paper
• 2305.11206
• Published • 27
Aligning Large Language Models through Synthetic Feedback
Paper
• 2305.13735
• Published • 1
Generative Judge for Evaluating Alignment
Paper
• 2310.05470
• Published • 1
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
• 2310.17631
• Published • 35
Quality-Diversity through AI Feedback
Paper
• 2310.13032
• Published • 1
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published • 15
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper
• 2203.11171
• Published • 5
Fine-tuning Language Models with Generative Adversarial Feedback
Paper
• 2305.06176
• Published • 1
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper
• 2310.01377
• Published • 6
Verbosity Bias in Preference Labeling by Large Language Models
Paper
• 2310.10076
• Published • 2
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
• 2309.00267
• Published • 53
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language
Models' Alignment
Paper
• 2308.05374
• Published • 31
Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment
Paper
• 2308.09662
• Published • 3
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Paper
• 2311.09528
• Published • 2
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published • 144
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published • 32
Reasons to Reject? Aligning Language Models with Judgments
Paper
• 2312.14591
• Published • 18
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
Code-based Self-Verification
Paper
• 2308.07921
• Published • 24
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Paper
• 2309.17452
• Published • 3
LLM Guided Inductive Inference for Solving Compositional Problems
Paper
• 2309.11688
• Published • 1
A Systematic Survey of Prompt Engineering in Large Language Models:
Techniques and Applications
Paper
• 2402.07927
• Published • 2
A Review of Sparse Expert Models in Deep Learning
Paper
• 2209.01667
• Published • 3
The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey
Paper
• 2401.07872
• Published • 2
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from
Knowledge Graphs
Paper
• 2309.03118
• Published • 2
Fabricator: An Open Source Toolkit for Generating Labeled Training Data
with Teacher LLMs
Paper
• 2309.09582
• Published • 4
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models
Paper
• 2310.13671
• Published • 19
Training Generative Question-Answering on Synthetic Data Obtained from
an Instruct-tuned Model
Paper
• 2310.08072
• Published • 1
Generative Data Augmentation using LLMs improves Distributional
Robustness in Question Answering
Paper
• 2309.06358
• Published • 1
Self-Alignment with Instruction Backtranslation
Paper
• 2308.06259
• Published • 43
A Comprehensive Analysis of Adapter Efficiency
Paper
• 2305.07491
• Published • 1
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques
for LLMs
Paper
• 2304.14999
• Published • 2
Comparison between parameter-efficient techniques and full fine-tuning:
A case study on multilingual news article classification
Paper
• 2308.07282
• Published • 1
LoRA: Low-Rank Adaptation of Large Language Models
Paper
• 2106.09685
• Published • 60
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314
• Published • 61
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for
Language Models
Paper
• 2402.13064
• Published • 51
MTEB: Massive Text Embedding Benchmark
Paper
• 2210.07316
• Published • 7
Beyond Scale: the Diversity Coefficient as a Data Quality Metric
Demonstrates LLMs are Pre-trained on Formally Diverse Data
Paper
• 2306.13840
• Published • 12
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
Paper
• 2212.07919
• Published
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published • 251
Training Verifiers to Solve Math Word Problems
Paper
• 2110.14168
• Published • 7
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
Challenge
Paper
• 1803.05457
• Published • 4
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Paper
• 1907.10641
• Published • 2
Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models
Paper
• 2206.04615
• Published • 6
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper
• 2303.12712
• Published • 5
Deduplicating Training Data Makes Language Models Better
Paper
• 2107.06499
• Published • 4
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 628
Oasis: Data Curation and Assessment System for Pretraining of Large
Language Models
Paper
• 2311.12537
• Published • 1
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large
Language Models
Paper
• 2402.10524
• Published • 23
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper
• 2303.11366
• Published • 7
Paper
• 2303.08774
• Published • 7
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published • 53
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of
Diverse Models
Paper
• 2404.18796
• Published • 71
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
• 2502.04404
• Published • 25
Agentic Reasoning and Tool Integration for LLMs via Reinforcement
Learning
Paper
• 2505.01441
• Published • 39