Stuff I'm going to read - a gmongaras Collection

gmongaras 's Collections

2Mamba2Furious: Linear in Complexity...

Stuff I'm going to read

Stable Diffusion 3 Checkpoints

Cosine Attention (Cottention)

Stuff I'm going to read

updated 2 days ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 154
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 24
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34
Advancing Open-source World Models

Paper • 2601.20540 • Published Jan 28 • 128
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

Paper • 2601.21709 • Published Jan 29 • 2
ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published 28 days ago • 261
FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published 29 days ago • 150
LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Paper • 2602.08676 • Published 23 days ago • 68
MOVA: Towards Scalable and Synchronized Video-Audio Generation

Paper • 2602.08794 • Published 23 days ago • 154
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 27 days ago • 343
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Paper • 2602.10560 • Published 21 days ago • 29
Towards Autonomous Mathematics Research

Paper • 2602.10177 • Published 22 days ago • 36
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Paper • 2602.10934 • Published 21 days ago • 49
Experiential Reinforcement Learning

Paper • 2602.13949 • Published 18 days ago • 68
BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Paper • 2602.14041 • Published 17 days ago • 52
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

Paper • 2602.15620 • Published 15 days ago • 3
SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Paper • 2602.12675 • Published 19 days ago • 53
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published 21 days ago • 216
Avey-B

Paper • 2602.15814 • Published 15 days ago • 3
Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Paper • 2602.18292 • Published 12 days ago • 10
Test-Time Training with KV Binding Is Secretly Linear Attention

Paper • 2602.21204 • Published 8 days ago • 29
Memory Caching: RNNs with Growing Memory

Paper • 2602.24281 • Published 5 days ago • 7