YaxinLuo's picture

4 5 3

YaxinLuo PRO

YaxinLuo

·

https://yaxin9luo.github.io./

Yaxin9Luo

AI & ML interests

AudioVisual Speaker extraction, video understanding, self-supervised large speech models

Organizations

upvoted a paper 3 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 99

upvoted 2 papers 7 months ago

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30, 2025 • 80

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Paper • 2505.24878 • Published May 30, 2025 • 23

upvoted 2 papers about 1 year ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Paper • 2410.13859 • Published Oct 17, 2024 • 8