Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper
•
2203.05482
•
Published
•
7
This is a merge of pre-trained language models created using mergekit.
This model was merged using the linear merge method using Qwen/Qwen2-VL-2B-Instruct as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
models:
- model: prithivMLmods/Blazer.1-2B-Vision
- model: prithivMLmods/Caption-Pro
- model: prithivMLmods/ChemQwen2-vL
- model: prithivMLmods/JSONify-Flux
- model: prithivMLmods/LatexMind-2B-Codec
- model: prithivMLmods/Omni-Reasoner-2B
- model: prithivMLmods/QvQ-Step-Tiny
- model: prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
- model: prithivMLmods/Qwen2-VL-OCR-2B-Instruct
- model: prithivMLmods/Radiology-Infer-Mini
- model: Qwen/Qwen2-VL-2B-Instruct
- model: Qwen/Qwen2-VL-2B
merge_method: linear
base_model: Qwen/Qwen2-VL-2B-Instruct
parameters:
weight: 0.5
normalize: true
int8_mask: true
dtype: bfloat16