OpenEvals

community

AI & ML interests

LLM evaluation

Recent Activity

nielsr submitted a paper 1 day ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

SaylorTwift updated a dataset 8 days ago

OpenEvals/leaderboard-data

nielsr submitted a paper 12 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

View all activity

submitted a paper to Daily Papers 1 day ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Paper • 2603.28130 • Published 6 days ago • 7

updated a dataset 8 days ago

OpenEvals/leaderboard-data

Viewer • Updated 7 days ago • 105 • 882 • 1

submitted a paper to Daily Papers 12 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 16 days ago • 5

submitted a paper to Daily Papers 16 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published 20 days ago • 26

updated a Space 16 days ago

Official Benchmarks Leaderboard 2026

Explore and compare AI model scores across official benchmarks

submitted a paper to Daily Papers 17 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 18 days ago • 20

published a Space 18 days ago

Official Benchmarks Leaderboard 2026

Explore and compare AI model scores across official benchmarks

published a dataset 18 days ago

OpenEvals/leaderboard-data

Viewer • Updated 7 days ago • 105 • 882 • 1

authored a paper 23 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 23 days ago • 64

authored a paper 23 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 23 days ago • 64

in OpenEvals/README 26 days ago

New Benchmark Dataset

#2 opened 2 months ago by

submitted a paper to Daily Papers about 1 month ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 7

submitted a paper to Daily Papers about 2 months ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published Feb 11 • 8

in OpenEvals/README about 2 months ago

Community Evals Feedback

#1 opened 2 months ago by

submitted a paper to Daily Papers 2 months ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

updated a Space 3 months ago

Benchmark Finder

A space to view and inspect all the tasks in lighteval

submitted 2 papers to Daily Papers 3 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

in OpenEvals/MuSR 4 months ago

[bot] Conversion to Parquet

#1 opened 4 months ago by

parquet-converter

updated a dataset 4 months ago

OpenEvals/MuSR

Viewer • Updated Dec 12, 2025 • 756 • 69