Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
YaxinLuo's picture
4 5 3

YaxinLuo PRO

YaxinLuo
·
https://yaxin9luo.github.io./
  • Yaxin9Luo

AI & ML interests

AudioVisual Speaker extraction, video understanding, self-supervised large speech models

Organizations

Mohamed Bin Zayed University of Artificial Intelligence's profile picture

upvoted a paper 3 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 99
upvoted 2 papers 7 months ago

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30, 2025 • 80

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Paper • 2505.24878 • Published May 30, 2025 • 23
upvoted 2 papers about 1 year ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Paper • 2410.13859 • Published Oct 17, 2024 • 8
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs