Med-2E3-M3D
Introduction
A 3D medical LVLM, Med-2E3, trained on 3D CT volumes and English medical texts (M3D-Cap & M3D-VQA), enabling tasks such as report generation and medical VQA.
| Config | |
|---|---|
| 3D Image encoder | GoodBaiBai88/M3D-CLIP |
| 2D Image encoder | google/siglip-large-patch16-256 |
| Connector | TG-IS scoring module |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32*256*256 |
| Sequence length | 768 |
Quickstart
Please refer to Med-2E3.
Citation
@article{shi2024med,
title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
journal={arXiv preprint arXiv:2411.12783},
year={2024}
}
- Downloads last month
- 12
Model tree for shiym2000/Med-2E3-M3D
Base model
GoodBaiBai88/M3D-CLIP