43 11 85

Yucheng

liyucheng

https://liyucheng09.github.io

liyucheng09

AI & ML interests

Robust LLMs Evaluation and Efficient LLMs Inference.

Recent Activity

upvoted a paper 18 days ago

MMGR: Multi-Modal Generative Reasoning

published a model 2 months ago

SecurityLingua/securitylingua-xlm-s2s

updated a dataset 6 months ago

RealTimeData/bbc_images_alltime

View all activity

Organizations

upvoted a paper 18 days ago

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published 18 days ago • 114

upvoted a paper 7 months ago

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104

upvoted a paper 8 months ago

MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention

Paper • 2504.16083 • Published Apr 22, 2025 • 8

upvoted a paper about 1 year ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Paper • 2412.10319 • Published Dec 13, 2024 • 11

upvoted a paper over 1 year ago

Data Contamination Report from the 2024 CONDA Shared Task

Paper • 2407.21530 • Published Jul 31, 2024 • 10

upvoted an article over 1 year ago

Article

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

Jul 11, 2024

•

upvoted a paper over 1 year ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 26

upvoted 3 articles over 1 year ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16, 2024

•

Article

Mixture of Experts Explained

Dec 11, 2023

•

1.02k

Article

Fine-tune Llama 3 with ORPO

Apr 22, 2024

•

241

Yucheng

AI & ML interests

Recent Activity

Organizations

liyucheng's activity

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

Unlocking Longer Generation with Key-Value Cache Quantization

Mixture of Experts Explained

Fine-tune Llama 3 with ORPO