Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps Paper • 2510.13430 • Published Oct 15, 2025 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
3LM: Bridging Arabic, STEM, and Code through Benchmarking Paper • 2507.15850 • Published Jul 21, 2025 • 6
3LM: Bridging Arabic, STEM, and Code through Benchmarking Paper • 2507.15850 • Published Jul 21, 2025 • 6
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps Paper • 2510.13430 • Published Oct 15, 2025 • 2
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation Paper • 2604.03395 • Published Apr 3 • 2
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation Paper • 2604.03395 • Published Apr 3 • 2
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling Paper • 2601.02346 • Published Jan 5 • 27
3LM: Bridging Arabic, STEM, and Code through Benchmarking Paper • 2507.15850 • Published Jul 21, 2025 • 6
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model Paper • 2402.10677 • Published Feb 16, 2024
Investigating Regularization of Self-Play Language Models Paper • 2404.04291 • Published Apr 4, 2024 • 1
Do Vision and Language Encoders Represent the World Similarly? Paper • 2401.05224 • Published Jan 10, 2024
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71