Papers to read

vladbogo 's Collections

AI Paper of the Day

LLMs

Papers to read

Fun demos

Vision

updated Feb 11, 2025

Upvote

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Paper • 2402.08714 • Published Feb 13, 2024 • 15
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15, 2024 • 25
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16, 2024 • 12
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21, 2024 • 13
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22, 2024 • 84
TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 20
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26, 2024 • 17
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Paper • 2402.11753 • Published Feb 19, 2024 • 6
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7, 2024 • 21
Evaluating and Mitigating Discrimination in Language Model Decisions

Paper • 2312.03689 • Published Dec 6, 2023 • 1
How predictable is language model benchmark performance?

Paper • 2401.04757 • Published Jan 9, 2024 • 2
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Paper • 2305.02547 • Published May 4, 2023 • 7
Is Cosine-Similarity of Embeddings Really About Similarity?

Paper • 2403.05440 • Published Mar 8, 2024 • 3
Multistep Consistency Models

Paper • 2403.06807 • Published Mar 11, 2024 • 15
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Paper • 2402.18216 • Published Feb 28, 2024 • 1
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 30
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25, 2024 • 25
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 33
DreamLIP: Language-Image Pre-training with Long Captions

Paper • 2403.17007 • Published Mar 25, 2024 • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Paper • 2403.20331 • Published Mar 29, 2024 • 16
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 27
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 30
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 23
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 111
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Paper • 2404.07153 • Published Apr 10, 2024 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 48
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11, 2024 • 32
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 21
MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18, 2024 • 27
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19, 2024 • 30
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22, 2024 • 19
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 45
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 23
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30, 2024 • 74
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 122
Corrective Retrieval Augmented Generation

Paper • 2401.15884 • Published Jan 29, 2024 • 4
Observational Scaling Laws and the Predictability of Language Model Performance

Paper • 2405.10938 • Published May 17, 2024 • 14
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157
Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20, 2024 • 30
LANISTR: Multimodal Learning from Structured and Unstructured Data

Paper • 2305.16556 • Published May 26, 2023 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published May 18, 2024 • 19
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 40
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24, 2024 • 28
Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48
MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 20
Xwin-LM: Strong and Scalable Alignment Practice for LLMs

Paper • 2405.20335 • Published May 30, 2024 • 17
LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published May 29, 2024 • 17
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2, 2024 • 33
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4, 2024 • 17
Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 20
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper • 2406.02884 • Published Jun 5, 2024 • 18
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41
Proofread: Fixes All Errors with One Tap

Paper • 2406.04523 • Published Jun 6, 2024 • 14
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11, 2024 • 36
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 28
Large Language Model Unlearning via Embedding-Corrupted Prompts

Paper • 2406.07933 • Published Jun 12, 2024 • 9
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31
Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14, 2024 • 78
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13, 2024 • 89
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17, 2024 • 62
mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17, 2024 • 40
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Paper • 2406.12849 • Published Jun 18, 2024 • 50
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17, 2024 • 16
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15, 2024 • 70
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Paper • 2406.14972 • Published Jun 21, 2024 • 7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Paper • 2406.13457 • Published Jun 19, 2024 • 17
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

Paper • 2406.12430 • Published Jun 18, 2024 • 7
Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 18
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Paper • 2406.17419 • Published Jun 25, 2024 • 16
Large Language Models Assume People are More Rational than We Really are

Paper • 2406.17055 • Published Jun 24, 2024 • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24, 2024 • 57
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24, 2024 • 30
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26
Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24, 2024 • 69
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 54
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Paper • 2406.20095 • Published Jun 28, 2024 • 18
MagMax: Leveraging Model Merging for Seamless Continual Learning

Paper • 2407.06322 • Published Jul 8, 2024 • 1
A Single Transformer for Scalable Vision-Language Modeling

Paper • 2407.06438 • Published Jul 8, 2024 • 1
Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11, 2024 • 49
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 140
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12, 2024 • 6
GAVEL: Generating Games Via Evolution and Language Models

Paper • 2407.09388 • Published Jul 12, 2024 • 17
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Paper • 2407.08726 • Published Jul 11, 2024 • 11
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Paper • 2407.08303 • Published Jul 11, 2024 • 19
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Paper • 2402.07282 • Published Feb 11, 2024 • 1
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 5
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Paper • 2407.00256 • Published Jun 28, 2024 • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback

Paper • 2403.00409 • Published Mar 1, 2024 • 2
Efficient Exploration for LLMs

Paper • 2402.00396 • Published Feb 1, 2024 • 22
Text2SQL is Not Enough: Unifying AI and Databases with TAG

Paper • 2408.14717 • Published Aug 27, 2024 • 26
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6, 2024 • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28, 2025 • 32
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Paper • 2402.14207 • Published Feb 22, 2024 • 10
Goku: Flow Based Video Generative Foundation Models

Paper • 2502.04896 • Published Feb 7, 2025 • 107

Upvote

Collection guide
Browse collections