wangyuchi's picture

3 9 1

wangyuchi

YuchiWang

·

https://wangyuchi369.github.io/

AI & ML interests

Multimodal; Generative Models

Recent Activity

new activity 22 days ago

TIGER-Lab/MMEB-V2:PNG file corruption

updated a collection 2 months ago

updated a collection 2 months ago

View all activity

Organizations

authored a paper 3 months ago

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Paper • 2505.22613 • Published May 28 • 9

authored a paper 12 months ago

VidTwin: Video VAE with Decoupled Structure and Dynamics

Paper • 2412.17726 • Published Dec 23, 2024 • 9

authored 7 papers about 1 year ago

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Paper • 2310.02071 • Published Oct 3, 2023 • 4

GAIA: Zero-shot Talking Avatar Generation

Paper • 2311.15230 • Published Nov 26, 2023 • 3

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

Paper • 2402.13185 • Published Feb 20, 2024

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Paper • 2402.15527 • Published Feb 21, 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Paper • 2404.10763 • Published Apr 16, 2024

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Paper • 2405.15758 • Published May 24, 2024 • 1

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

Paper • 2406.08096 • Published Jun 12, 2024