UME-R1 is a framework designed to endow multimodal embedding models with the flexibility to switch between discriminative and generative embeddings
Zhibin Lan
zhibinlan
AI & ML interests
None yet
Recent Activity
updated a dataset 1 day ago
zhibinlan/OCRMT30K published a dataset 1 day ago
zhibinlan/OCRMT30K updated a collection 26 days ago
UME-R1Organizations
None yet
LLaVE
LLaVE is a series of large language and vision embedding models trained on a variety of multimodal embedding datasets
-
zhibinlan/LLaVE-0.5B
Image-Text-to-Text • 0.9B • Updated • 34 • 7 -
zhibinlan/LLaVE-2B
Image-Text-to-Text • 2B • Updated • 114 • 45 -
zhibinlan/LLaVE-7B
Image-Text-to-Text • 8B • Updated • 48 • 5 -
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Paper • 2503.04812 • Published • 17
UME-R1
UME-R1 is a framework designed to endow multimodal embedding models with the flexibility to switch between discriminative and generative embeddings
LLaVE
LLaVE is a series of large language and vision embedding models trained on a variety of multimodal embedding datasets
-
zhibinlan/LLaVE-0.5B
Image-Text-to-Text • 0.9B • Updated • 34 • 7 -
zhibinlan/LLaVE-2B
Image-Text-to-Text • 2B • Updated • 114 • 45 -
zhibinlan/LLaVE-7B
Image-Text-to-Text • 8B • Updated • 48 • 5 -
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Paper • 2503.04812 • Published • 17
models 7
zhibinlan/UME-R1-7B
Image-Text-to-Text • 8B • Updated • 45 • 5
zhibinlan/UME-R1-2B
Image-Text-to-Text • 2B • Updated • 2.18k • 5
zhibinlan/LLaVE-7B
Image-Text-to-Text • 8B • Updated • 48 • 5
zhibinlan/LLaVE-0.5B
Image-Text-to-Text • 0.9B • Updated • 34 • 7
zhibinlan/LLaVE-2B
Image-Text-to-Text • 2B • Updated • 114 • 45
zhibinlan/AVG-LLaVA
Image-Text-to-Text • 7B • Updated • 6 • 2
zhibinlan/AVG-LLaVA-Stage3
Image-Text-to-Text • 7B • Updated • 1