Papers to read
updated
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward
Finetuning of Diffusion Models
Paper
• 2402.08714
• Published • 15
Data Engineering for Scaling Language Models to 128K Context
Paper
• 2402.10171
• Published • 25
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
• 2402.10893
• Published • 12
Coercing LLMs to do and reveal (almost) anything
Paper
• 2402.14020
• Published • 13
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
• 2402.14658
• Published • 84
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
• 2402.14289
• Published • 20
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper
• 2402.16822
• Published • 17
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published • 53
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
• 2403.02677
• Published • 18
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Paper
• 2402.11753
• Published • 6
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper
• 2403.04732
• Published • 21
Evaluating and Mitigating Discrimination in Language Model Decisions
Paper
• 2312.03689
• Published • 1
How predictable is language model benchmark performance?
Paper
• 2401.04757
• Published • 2
PersonaLLM: Investigating the Ability of Large Language Models to
Express Personality Traits
Paper
• 2305.02547
• Published • 7
Is Cosine-Similarity of Embeddings Really About Similarity?
Paper
• 2403.05440
• Published • 3
Multistep Consistency Models
Paper
• 2403.06807
• Published • 15
LLM Task Interference: An Initial Study on the Impact of Task-Switch in
Conversational History
Paper
• 2402.18216
• Published • 1
V3D: Video Diffusion Models are Effective 3D Generators
Paper
• 2403.06738
• Published • 30
Paper
• 2401.04088
• Published • 160
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published • 72
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
• 2403.16990
• Published • 25
Can large language models explore in-context?
Paper
• 2403.15371
• Published • 33
DreamLIP: Language-Image Pre-training with Long Captions
Paper
• 2403.17007
• Published • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision
Language Models
Paper
• 2403.20331
• Published • 16
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact
Language Model
Paper
• 2404.01331
• Published • 27
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
• 2404.02905
• Published • 74
Stream of Search (SoS): Learning to Search in Language
Paper
• 2404.03683
• Published • 30
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
• 2404.05726
• Published • 23
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published • 111
Lost in Translation: Modern Neural Networks Still Struggle With Small
Realistic Image Transformations
Paper
• 2404.07153
• Published • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
• 2404.07987
• Published • 48
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published • 32
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
• 2404.09967
• Published • 21
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper
• 2404.12385
• Published • 27
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
• 2404.12803
• Published • 30
Groma: Localized Visual Tokenization for Grounding Multimodal Large
Language Models
Paper
• 2404.13013
• Published • 31
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension
and Generation
Paper
• 2404.14396
• Published • 19
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper
• 2404.14047
• Published • 45
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
• 2404.14507
• Published • 23
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
• 2404.19427
• Published • 74
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published • 122
Corrective Retrieval Augmented Generation
Paper
• 2401.15884
• Published • 4
Observational Scaling Laws and the Predictability of Language Model
Performance
Paper
• 2405.10938
• Published • 14
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published • 157
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
• 2405.12399
• Published • 30
LANISTR: Multimodal Learning from Structured and Unstructured Data
Paper
• 2305.16556
• Published • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper
• 2405.11273
• Published • 19
Not All Language Model Features Are Linear
Paper
• 2405.14860
• Published • 40
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
LLM Pre-Training
Paper
• 2405.15319
• Published • 28
Paper
• 2405.18407
• Published • 48
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
• 2405.20340
• Published • 20
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Paper
• 2405.20335
• Published • 17
LLMs achieve adult human performance on higher-order theory of mind
tasks
Paper
• 2405.18870
• Published • 17
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published • 33
Guiding a Diffusion Model with a Bad Version of Itself
Paper
• 2406.02507
• Published • 17
Self-Improving Robust Preference Optimization
Paper
• 2406.01660
• Published • 20
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with
LLM
Paper
• 2406.02884
• Published • 18
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
• 2406.02657
• Published • 41
Proofread: Fixes All Errors with One Tap
Paper
• 2406.04523
• Published • 14
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio
Understanding in Video-LLMs
Paper
• 2406.07476
• Published • 36
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
• 2406.08407
• Published • 28
Large Language Model Unlearning via Embedding-Corrupted Prompts
Paper
• 2406.07933
• Published • 9
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
• 2406.09415
• Published • 51
TextGrad: Automatic "Differentiation" via Text
Paper
• 2406.07496
• Published • 31
Make It Count: Text-to-Image Generation with an Accurate Number of
Objects
Paper
• 2406.10210
• Published • 78
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context
Reinforcement Learning
Paper
• 2406.08973
• Published • 89
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs
Paper
• 2406.11833
• Published • 62
mDPO: Conditional Preference Optimization for Multimodal Large Language
Models
Paper
• 2406.11839
• Published • 40
How Do Large Language Models Acquire Factual Knowledge During
Pretraining?
Paper
• 2406.11813
• Published • 31
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published • 72
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective
Distillation and Unlabeled Data Augmentation
Paper
• 2406.12849
• Published • 50
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
• 2406.12034
• Published • 16
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN
Inversion and High Quality Image Editing
Paper
• 2406.10601
• Published • 70
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Paper
• 2406.14972
• Published • 7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Paper
• 2406.13457
• Published • 17
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large
Language Models as Decision Makers
Paper
• 2406.12430
• Published • 7
Weight subcloning: direct initialization of transformers using larger
pretrained ones
Paper
• 2312.09299
• Published • 18
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended
Multi-Doc QA
Paper
• 2406.17419
• Published • 16
Large Language Models Assume People are More Rational than We Really are
Paper
• 2406.17055
• Published • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image
Generation
Paper
• 2406.16855
• Published • 57
Video-Infinity: Distributed Long Video Generation
Paper
• 2406.16260
• Published • 30
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
Large Video-Language Models
Paper
• 2406.16338
• Published • 26
Adam-mini: Use Fewer Learning Rates To Gain More
Paper
• 2406.16793
• Published • 69
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Paper
• 2406.19389
• Published • 54
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Paper
• 2406.20095
• Published • 18
MagMax: Leveraging Model Merging for Seamless Continual Learning
Paper
• 2407.06322
• Published • 1
A Single Transformer for Scalable Vision-Language Modeling
Paper
• 2407.06438
• Published • 1
Video Diffusion Alignment via Reward Gradients
Paper
• 2407.08737
• Published • 49
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
• 2407.09025
• Published • 140
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled
Refusal Training
Paper
• 2407.09121
• Published • 6
GAVEL: Generating Games Via Evolution and Language Models
Paper
• 2407.09388
• Published • 17
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using
Large-scale Public Data
Paper
• 2407.08726
• Published • 11
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal
Perception
Paper
• 2407.08303
• Published • 19
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context
Window?
Paper
• 2407.11963
• Published • 44
How do Large Language Models Navigate Conflicts between Honesty and
Helpfulness?
Paper
• 2402.07282
• Published • 1
Fewer Truncations Improve Language Modeling
Paper
• 2404.10830
• Published • 5
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert
Prompts
Paper
• 2407.00256
• Published • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Paper
• 2403.00409
• Published • 2
Efficient Exploration for LLMs
Paper
• 2402.00396
• Published • 22
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Paper
• 2408.14717
• Published • 26
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized
Academic Assistance
Paper
• 2409.04593
• Published • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
• 2501.16975
• Published • 32
Assisting in Writing Wikipedia-like Articles From Scratch with Large
Language Models
Paper
• 2402.14207
• Published • 10
Goku: Flow Based Video Generative Foundation Models
Paper
• 2502.04896
• Published • 107