Image-Text-to-Text
Transformers
TensorBoard
Safetensors
llavaonevision1_5
text-generation
conversational
Jinghao-Guo commited on
Commit
cb70e64
·
verified ·
1 Parent(s): 258862c

Transfer model via script

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - DeepGlint-AI/rice-vit-large-patch14-560
4
+ - Qwen/Qwen3-4B-Instruct-2507
5
+ datasets:
6
+ - lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M
7
+ - lmms-lab/LLaVA-OneVision-1.5-Insturct-Data
8
+ - HuggingFaceM4/FineVision
9
+ library_name: transformers
10
+ license: apache-2.0
11
+ pipeline_tag: image-text-to-text
12
+ ---
13
+
14
+ <div align="center">
15
+
16
+ <h1>LLaVA-OneVision-1.5: Fully Open-Source State-of-the-Art VLM Model</h1>
17
+
18
+
19
+ <p>
20
+ <a href="https://huggingface.co/papers/2509.23661">
21
+ <img alt="Paper" src="https://img.shields.io/badge/Paper-b31b1b?style=for-the-badge&logo=arXiv&logoColor=white">
22
+ </a>
23
+ <a href="https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5">
24
+ <img alt="Code" src="https://img.shields.io/badge/Code-181717?style=for-the-badge&logo=github&logoColor=white">
25
+ </a>
26
+ <a href="https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M">
27
+ <img alt="Mid-Training Dataset" src="https://img.shields.io/badge/Mid--Training%20Dataset-f59e0b?style=for-the-badge&logo=huggingface&logoColor=white">
28
+ </a>
29
+ <a href="https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data">
30
+ <img alt="Instruct Dataset" src="https://img.shields.io/badge/Instruct%20Dataset-3fb950?style=for-the-badge&logo=huggingface&logoColor=white">
31
+ </a>
32
+ <a href="https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5">
33
+ <img alt="Demo" src="https://img.shields.io/badge/Demo-1f6feb?style=for-the-badge&logo=huggingface&logoColor=white">
34
+ </a>
35
+ <a href="https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard">
36
+ <img alt="TensorBoard" src="https://img.shields.io/badge/TensorBoard-FF6F00?style=for-the-badge&logo=tensorflow&logoColor=white">
37
+ </a>
38
+ </p>
39
+
40
+ </div>
41
+
42
+
43
+
44
+ ## Introduction
45
+
46
+ LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LMMs) built to democratize multimodal training. Trained on native‑resolution images, it delivers state‑of‑the‑art performance at substantially lower cost. The project also releases high‑quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs to support transparent, reproducible research.
47
+ #### **Superior Performance**
48
+ - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
49
+ - Training on native-resolution images significantly improves its visual understanding.
50
+
51
+ #### **High-Quality Data at Scale**
52
+ - The pretraining corpus comprises large-scale, concept-balanced, diverse, and high-quality captions curated with strict filtering and quality control.
53
+ - The instruction-tuning dataset is comprehensive and covers a wide range of tasks.
54
+
55
+ #### **Ultra-Efficient Training Framework**
56
+ - The end-to-end training cost is about $16,000 on A100 GPUs at roughly $0.60 per GPU-hour.
57
+ - The system is built on Megatron-LM with support for MoE, FP8, and long-sequence parallelism, and the codebase is optimized for cost-effective scaling.
58
+
59
+ #### **Fully Open Framework**
60
+ - The project releases high-quality pretraining and SFT datasets along with the complete training framework, configurations, and recipes.
61
+ - It also provides detailed training logs and metrics to enable reproducibility and community adoption.
62
+
63
+
64
+ ## Models
65
+
66
+ | Model | HF Link | Training Log |
67
+ |---|---|---|
68
+ | LLaVA-OV-1.5-4B-Instruct | [🤗 HF / 4B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard) |
69
+ | LLaVA-OV-1.5-8B-Instruct | [🤗 HF / 8B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
70
+
71
+ ## Dataset
72
+
73
+ | Description | Link | Status |
74
+ |--------------------|--------------------------------------------------------------------------------------------------------|-------------|
75
+ | LLaVA-OneVision-1.5-Mid-Training-85M | [🤗HF / Mid-Training 85M](https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M) | Uploading… |
76
+ | LLaVA-OneVision-1.5-Instruct | [🤗HF / Instruct-Data](https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data) | Available |
77
+
78
+ ## Evaluation Results
79
+ All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
80
+
81
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/655c70d331c4978366d4b2e6/J8oBYmQkTOC6pBNLgJn9d.png)
82
+
83
+ ## Quick Start with HuggingFace
84
+
85
+ Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
86
+
87
+ ```python
88
+ from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
89
+ from qwen_vl_utils import process_vision_info
90
+ model_path = "lmms-lab/LLaVA-One-Vision-1.5-8B-Instruct"
91
+
92
+ # default: Load the model on the available device(s)
93
+ model = AutoModelForCausalLM.from_pretrained(
94
+ model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True
95
+ )
96
+
97
+ # default processer
98
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
99
+
100
+ messages = [
101
+ {
102
+ "role": "user",
103
+ "content": [
104
+ {
105
+ "type": "image",
106
+ "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
107
+ },
108
+ {"type": "text", "text": "Describe this image."},
109
+ ],
110
+ }
111
+ ]
112
+
113
+ # Preparation for inference
114
+ text = processor.apply_chat_template(
115
+ messages, tokenize=False, add_generation_prompt=True
116
+ )
117
+ image_inputs, video_inputs = process_vision_info(messages)
118
+ inputs = processor(
119
+ text=[text],
120
+ images=image_inputs,
121
+ videos=video_inputs,
122
+ padding=True,
123
+ return_tensors="pt",
124
+ )
125
+ inputs = inputs.to("cuda")
126
+
127
+ # Inference: Generation of the output
128
+ generated_ids = model.generate(**inputs, max_new_tokens=1024)
129
+ generated_ids_trimmed = [
130
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
131
+ ]
132
+ output_text = processor.batch_decode(
133
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
134
+ )
135
+ print(output_text)
136
+ ```
137
+
138
+ ## Evaluation
139
+ ```
140
+ # pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
141
+
142
+ accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \
143
+ --model=llava_onevision1_5 \
144
+ --model_args=pretrained=lmms-lab/LLaVA-OneVision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000 \
145
+ --tasks=mmmu_val,mmmu_pro_standard,mmbench_en_test,mmerealworld,mmerealworld_cn,ai2d,ai2d_no_mask,vstar_bench,chartqa,charxiv,docvqa_test,mathvista_testmini,mmstar,scienceqa \
146
+ --batch_size=1
147
+ ```
148
+
149
+
150
+
151
+ ### Mid-Training
152
+
153
+ To improve model training efficiency, we implement offline sample packing:
154
+
155
+ 1. Download the [**Mid-Training-85M Dataset**](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M)
156
+ 2. Pack the data into webdataset format, refer to [**Examples offlinepacking**](examples_offline_packing) and [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
157
+
158
+
159
+ ### Instruct
160
+ 1. Download the [**LLaVA-OneVision-1.5-Insturct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data)
161
+ 2. Convert the data into webdataset format, refer to [**Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
162
+
163
+ ## Roadmaps
164
+
165
+ Q4 2025 Key Deliverables:
166
+
167
+ 1. **Ultra-efficient MoE Training**
168
+ 2. **Full Video Input LLM**
169
+
170
+
171
+ ## Contributors
172
+ Thanks so much to all of our amazing contributors!
173
+
174
+ <!-- readme: collaborators,contributors,jiankangdeng/- -start -->
175
+ <table>
176
+ <tbody>
177
+ <tr>
178
+ <td align="center">
179
+ <a href="https://github.com/fdcp">
180
+ <img src="https://avatars.githubusercontent.com/u/15667917?v=4" width="80;" alt="fdcp"/>
181
+ <br />
182
+ <sub><b>fdcp</b></sub>
183
+ </a>
184
+ </td>
185
+ <td align="center">
186
+ <a href="https://github.com/anxiangsir">
187
+ <img src="https://avatars.githubusercontent.com/u/31175974?v=4" width="80;" alt="anxiangsir"/>
188
+ <br />
189
+ <sub><b>anxiangsir</b></sub>
190
+ </a>
191
+ </td>
192
+ <td align="center">
193
+ <a href="https://github.com/yiyexy">
194
+ <img src="https://avatars.githubusercontent.com/u/35927125?v=4" width="80;" alt="yiyexy"/>
195
+ <br />
196
+ <sub><b>yiyexy</b></sub>
197
+ </a>
198
+ </td>
199
+ <td align="center">
200
+ <a href="https://github.com/wideyard">
201
+ <img src="https://avatars.githubusercontent.com/u/101321826?v=4" width="80;" alt="wideyard"/>
202
+ <br />
203
+ <sub><b>wideyard</b></sub>
204
+ </a>
205
+ </td>
206
+ <td align="center">
207
+ <a href="https://github.com/chengzheng345">
208
+ <img src="https://avatars.githubusercontent.com/u/209475443?v=4" width="80;" alt="chengzheng345"/>
209
+ <br />
210
+ <sub><b>chengzheng345</b></sub>
211
+ </a>
212
+ </td>
213
+ <td align="center">
214
+ <a href="https://github.com/killTheHostage">
215
+ <img src="https://avatars.githubusercontent.com/u/16442720?v=4" width="80;" alt="killTheHostage"/>
216
+ <br />
217
+ <sub><b>killTheHostage</b></sub>
218
+ </a>
219
+ </td>
220
+ <td align="center">
221
+ <a href="https://github.com/mathCrazyy">
222
+ <img src="https://avatars.githubusercontent.com/u/20607153?v=4" width="80;" alt="mathCrazyy"/>
223
+ <br />
224
+ <sub><b>mathCrazyy</b></sub>
225
+ </a>
226
+ </td>
227
+ <td align="center">
228
+ <a href="https://github.com/yunglechao">
229
+ <img src="https://avatars.githubusercontent.com/u/7631185?v=4" width="80;" alt="yunglechao"/>
230
+ <br />
231
+ <sub><b>yunglechao</b></sub>
232
+ </a>
233
+ </td>
234
+ </tr>
235
+ <tr>
236
+ <td align="center">
237
+ <a href="https://github.com/RobitYadda">
238
+ <img src="https://avatars.githubusercontent.com/u/6811311?v=4" width="80;" alt="RobitYadda"/>
239
+ <br />
240
+ <sub><b>RobitYadda</b></sub>
241
+ </a>
242
+ </td>
243
+ </tr>
244
+ <tbody>
245
+ </table>
246
+ <!-- readme: collaborators,contributors,jiankangdeng/- -end -->
247
+
248
+ ## Citation
249
+
250
+ If you find *LLaVA-OneVision-1.5* useful in your research, please consider to cite the following related papers:
251
+
252
+ ```
253
+ @inproceedings{LLaVA-OneVision-1.5,
254
+ title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training},
255
+ author={An, Xiang and Xie, Yin and Yang, Kaicheng and Zhang, Wenkang and Zhao, Xiuwei and Cheng, Zheng and Wang, Yirui and Xu, Songcen and Chen, Changrui and Wu, Chunsheng and Tan, Huajie and Li, Chunyuan and Yang, Jing and Yu, Jie and Wang, Xiyao and Qin, Bin and Wang, Yumeng and Yan, Zizhen and Feng, Ziyong and Liu, Ziwei and Li, Bo and Deng, Jiankang},
256
+ booktitle={arxiv},
257
+ year={2025}
258
+ }
259
+
260
+ @inproceedings{xie2025region,
261
+ title={Region-based Cluster Discrimination for Visual Representation Learning},
262
+ author={Xie, Yin and Yang, Kaicheng and An, Xiang and Wu, Kun and Zhao, Yongle and Deng, Weimo and Ran, Zimin and Wang, Yumeng and Feng, Ziyong and Miles, Roy and Elezi, Ismail and Deng, Jiankang},
263
+ booktitle={ICCV},
264
+ year={2025}
265
+ }
266
+
267
+ @article{lillava,
268
+ title={LLaVA-OneVision: Easy Visual Task Transfer},
269
+ author={Li, Bo and Zhang, Yuanhan and Guo, Dong and Zhang, Renrui and Li, Feng and Zhang, Hao and Zhang, Kaichen and Zhang, Peiyuan and Li, Yanwei and Liu, Ziwei and Li, Chunyuan},
270
+ journal={Transactions on Machine Learning Research}
271
+ year={2024}
272
+ }
273
+ ```
274
+
275
+ ## Acknowledgement
276
+
277
+ We extend our sincere gratitude to **AIAK team of the** [**Baige AI computing platform**](https://cloud.baidu.com/product/aihc.html) **from Baidu AI Cloud** for providing the exceptional training framework. The outstanding capabilities of AIAK-Training-LLM and AIAK-Megatron have significantly accelerated our training process with remarkable efficiency. These cutting-edge frameworks have been instrumental in achieving our research goals. `To get full AIAK support, you can contact Baidu Cloud.`
278
+
279
+
280
+ We also thank the maintainers and contributors of the following open-source projects, whose work greatly inspired and supported our research:
281
+
282
+ - LLaVA: Large Language-and-Vision Assistant — [LLaVA](https://github.com/haotian-liu/LLaVA)
283
+ - LLaVA-NeXT: Next-generation multi-modal assistant — [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT)
284
+ - lmms-eval: A standardized evaluation framework for Large Multimodal Models — [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)
285
+ - Megatron-LM: Efficient, scalable training for large language models — [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
286
+ - Qwen2.5-VL: Strong vision-language foundation model — [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
287
+ - InternVL: Open-source large-scale vision-language foundation model — [InternVL](https://github.com/OpenGVLab/InternVL)
288
+ - Qwen3: Next-generation Qwen LLM — [Qwen](https://github.com/QwenLM/Qwen)
289
+ - MetaCLIP: Scalable contrastive pretraining — [MetaCLIP](https://github.com/facebookresearch/MetaCLIP)
290
+ - FineVision: Open Data Is All You Need — [FineVision](https://huggingface.co/spaces/HuggingFaceM4/FineVision)
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
config.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LLaVAOneVision1_5_ForConditionalGeneration"
4
+ ],
5
+ "image_token_id": 151655,
6
+ "model_type": "llavaonevision1_5",
7
+ "text_config": {
8
+ "attention_bias": false,
9
+ "attention_dropout": 0.0,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2560,
13
+ "image_token_id": null,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 9728,
16
+ "layer_types": [
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention",
52
+ "full_attention"
53
+ ],
54
+ "max_position_embeddings": 262144,
55
+ "max_window_layers": 36,
56
+ "model_type": "LLaVAOneVision1_5_text",
57
+ "num_attention_heads": 32,
58
+ "num_hidden_layers": 36,
59
+ "num_key_value_heads": 8,
60
+ "rms_norm_eps": 1e-06,
61
+ "rope_scaling": null,
62
+ "rope_theta": 5000000.0,
63
+ "sliding_window": null,
64
+ "use_cache": true,
65
+ "use_sliding_window": false,
66
+ "video_token_id": null,
67
+ "vocab_size": 151936
68
+ },
69
+ "torch_dtype": "bfloat16",
70
+ "transformers_version": "4.53.0",
71
+ "video_token_id": 151656,
72
+ "vision_config": {
73
+ "depth": 24,
74
+ "embed_dim": 1024,
75
+ "hidden_act": "gelu",
76
+ "hidden_size": 1024,
77
+ "in_channels": 3,
78
+ "initializer_range": 0.02,
79
+ "intermediate_size": 4096,
80
+ "layer_norm_eps": 1e-05,
81
+ "model_type": "rice_vit",
82
+ "num_heads": 16,
83
+ "patch_size": 14,
84
+ "spatial_merge_size": 2,
85
+ "temporal_patch_size": 1,
86
+ "text_hidden_size": 2560
87
+ },
88
+ "vocab_size": 151936
89
+ }
configuration_llavaonevision1_5.py ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from transformers.configuration_utils import PretrainedConfig, layer_type_validation
16
+ from transformers.modeling_rope_utils import rope_config_validation
17
+ from transformers.utils import logging
18
+
19
+ logger = logging.get_logger(__name__)
20
+
21
+ class RiceConfig(PretrainedConfig):
22
+ model_type = "rice_vit"
23
+ base_config_key = "vision_config"
24
+
25
+ def __init__(
26
+ self,
27
+ depth=24,
28
+ embed_dim=1024,
29
+ hidden_size=1024,
30
+ hidden_act="gelu",
31
+ intermediate_size=4096,
32
+ num_heads=16,
33
+ in_channels=3,
34
+ patch_size=14,
35
+ spatial_merge_size=2,
36
+ temporal_patch_size=1,
37
+ initializer_range=0.02,
38
+ layer_norm_eps=1e-05,
39
+ text_hidden_size=2560,
40
+ **kwargs,
41
+ ):
42
+ super().__init__(**kwargs)
43
+
44
+ self.depth = depth
45
+ self.embed_dim = embed_dim
46
+ self.hidden_size = hidden_size
47
+ self.hidden_act = hidden_act
48
+ self.intermediate_size = intermediate_size
49
+ self.num_heads = num_heads
50
+ self.in_channels = in_channels
51
+ self.patch_size = patch_size
52
+ self.spatial_merge_size = spatial_merge_size
53
+ self.temporal_patch_size = temporal_patch_size
54
+ self.initializer_range = initializer_range
55
+ self.layer_norm_eps = layer_norm_eps
56
+ self.text_hidden_size = text_hidden_size
57
+
58
+
59
+ class LLaVAOneVision1_5_TextConfig(PretrainedConfig):
60
+ r"""
61
+ Args:
62
+ vocab_size (`int`, *optional*, defaults to 152064):
63
+ Vocabulary size of the Qwen2VL model. Defines the number of different tokens that can be represented by the
64
+ `inputs_ids` passed when calling [`Qwen2VLModel`]
65
+ hidden_size (`int`, *optional*, defaults to 8192):
66
+ Dimension of the hidden representations.
67
+ intermediate_size (`int`, *optional*, defaults to 29568):
68
+ Dimension of the MLP representations.
69
+ num_hidden_layers (`int`, *optional*, defaults to 80):
70
+ Number of hidden layers in the Transformer encoder.
71
+ num_attention_heads (`int`, *optional*, defaults to 64):
72
+ Number of attention heads for each attention layer in the Transformer encoder.
73
+ num_key_value_heads (`int`, *optional*, defaults to 8):
74
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
75
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
76
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
77
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
78
+ by meanpooling all the original heads within that group. For more details checkout [this
79
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `32`.
80
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
81
+ The non-linear activation function (function or string) in the decoder.
82
+ max_position_embeddings (`int`, *optional*, defaults to 32768):
83
+ The maximum sequence length that this model might ever be used with.
84
+ initializer_range (`float`, *optional*, defaults to 0.02):
85
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
86
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
87
+ The epsilon used by the rms normalization layers.
88
+ use_cache (`bool`, *optional*, defaults to `True`):
89
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
90
+ relevant if `config.is_decoder=True`.
91
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
92
+ Whether the model's input and output word embeddings should be tied.
93
+ rope_theta (`float`, *optional*, defaults to 1000000.0):
94
+ The base period of the RoPE embeddings.
95
+ use_sliding_window (`bool`, *optional*, defaults to `False`):
96
+ Whether to use sliding window attention.
97
+ sliding_window (`int`, *optional*, defaults to 4096):
98
+ Sliding window attention (SWA) window size. If not specified, will default to `4096`.
99
+ max_window_layers (`int`, *optional*, defaults to 80):
100
+ The number of layers that use SWA (Sliding Window Attention). The bottom layers use SWA while the top use full attention.
101
+ attention_dropout (`float`, *optional*, defaults to 0.0):
102
+ The dropout ratio for the attention probabilities.
103
+ rope_scaling (`Dict`, *optional*):
104
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
105
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
106
+ accordingly.
107
+ Expected contents:
108
+ `rope_type` (`str`):
109
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
110
+ 'llama3'], with 'default' being the original RoPE implementation.
111
+ `factor` (`float`, *optional*):
112
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
113
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
114
+ original maximum pre-trained length.
115
+ `original_max_position_embeddings` (`int`, *optional*):
116
+ Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
117
+ pretraining.
118
+ `attention_factor` (`float`, *optional*):
119
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
120
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
121
+ `factor` field to infer the suggested value.
122
+ `beta_fast` (`float`, *optional*):
123
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
124
+ ramp function. If unspecified, it defaults to 32.
125
+ `beta_slow` (`float`, *optional*):
126
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
127
+ ramp function. If unspecified, it defaults to 1.
128
+ `short_factor` (`List[float]`, *optional*):
129
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
130
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
131
+ size divided by the number of attention heads divided by 2
132
+ `long_factor` (`List[float]`, *optional*):
133
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
134
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
135
+ size divided by the number of attention heads divided by 2
136
+ `low_freq_factor` (`float`, *optional*):
137
+ Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
138
+ `high_freq_factor` (`float`, *optional*):
139
+ Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
140
+ image_token_id (`int`, *optional*):
141
+ Token index used as placeholder for image embeddings.
142
+ video_token_id (`int`, *optional*):
143
+ Token index used as placeholder for video embeddings.
144
+
145
+ """
146
+
147
+ model_type = "LLaVAOneVision1_5_text"
148
+ base_config_key = "text_config"
149
+ keys_to_ignore_at_inference = ["past_key_values"]
150
+ # Default tensor parallel plan for base model `Qwen2VL`
151
+ base_model_tp_plan = {
152
+ "layers.*.self_attn.q_proj": "colwise",
153
+ "layers.*.self_attn.k_proj": "colwise",
154
+ "layers.*.self_attn.v_proj": "colwise",
155
+ "layers.*.self_attn.o_proj": "rowwise",
156
+ "layers.*.mlp.gate_proj": "colwise",
157
+ "layers.*.mlp.up_proj": "colwise",
158
+ "layers.*.mlp.down_proj": "rowwise",
159
+ }
160
+ base_model_pp_plan = {
161
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
162
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
163
+ "norm": (["hidden_states"], ["hidden_states"]),
164
+ }
165
+
166
+ def __init__(
167
+ self,
168
+ vocab_size=151936,
169
+ hidden_size=4096,
170
+ intermediate_size=12288,
171
+ num_hidden_layers=36,
172
+ num_attention_heads=32,
173
+ num_key_value_heads=8,
174
+ head_dim=128,
175
+ hidden_act="silu",
176
+ max_position_embeddings=32768,
177
+ initializer_range=0.02,
178
+ rms_norm_eps=1e-06,
179
+ use_cache=True,
180
+ tie_word_embeddings=False,
181
+ rope_theta=1000000.0,
182
+ attention_bias=False,
183
+ use_sliding_window=False,
184
+ sliding_window=None,
185
+ max_window_layers=36,
186
+ attention_dropout=0.0,
187
+ rope_scaling=None,
188
+ layer_types=None,
189
+ image_token_id=None,
190
+ video_token_id=None,
191
+ **kwargs,
192
+ ):
193
+ self.vocab_size = vocab_size
194
+ self.max_position_embeddings = max_position_embeddings
195
+ self.hidden_size = hidden_size
196
+ self.intermediate_size = intermediate_size
197
+ self.num_hidden_layers = num_hidden_layers
198
+ self.num_attention_heads = num_attention_heads
199
+ self.use_sliding_window = use_sliding_window
200
+ self.sliding_window = sliding_window
201
+ self.max_window_layers = max_window_layers
202
+
203
+ # for backward compatibility
204
+ if num_key_value_heads is None:
205
+ num_key_value_heads = num_attention_heads
206
+
207
+ self.num_key_value_heads = num_key_value_heads
208
+ self.head_dim = head_dim
209
+ self.hidden_act = hidden_act
210
+ self.initializer_range = initializer_range
211
+ self.rms_norm_eps = rms_norm_eps
212
+ self.use_cache = use_cache
213
+ self.rope_theta = rope_theta
214
+ self.attention_dropout = attention_dropout
215
+ self.rope_scaling = rope_scaling
216
+ self.attention_bias = attention_bias
217
+ self.tie_word_embeddings = tie_word_embeddings
218
+
219
+ # Validate the correctness of rotary position embeddings parameters
220
+ # BC: if there is a 'type' field, move it to 'rope_type'.
221
+ # and change type from 'mrope' to 'default' because `mrope` does default RoPE calculations
222
+ # one can set it to "linear"/"dynamic" etc. to have scaled RoPE
223
+ # TODO: @raushan update config in the hub
224
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
225
+ if self.rope_scaling["type"] == "mrope":
226
+ self.rope_scaling["type"] = "default"
227
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
228
+ rope_config_validation(self, ignore_keys={"mrope_section"})
229
+ self.image_token_id = image_token_id
230
+ self.video_token_id = video_token_id
231
+
232
+ self.layer_types = layer_types
233
+ if self.layer_types is None:
234
+ self.layer_types = [
235
+ "sliding_attention"
236
+ if self.sliding_window is not None and i >= self.max_window_layers
237
+ else "full_attention"
238
+ for i in range(self.num_hidden_layers)
239
+ ]
240
+ layer_type_validation(self.layer_types)
241
+
242
+ super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
243
+
244
+
245
+ class Llavaonevision1_5Config(PretrainedConfig):
246
+ r"""
247
+ Args:
248
+ text_config (`Union[PreTrainedConfig, dict]`, *optional*, defaults to `LLaVAOneVision1_5_TextConfig`):
249
+ The config object or dictionary of the text backbone.
250
+ vision_config (`Union[PreTrainedConfig, dict]`, *optional*, defaults to `LLaVAOneVision1_5_VisionConfig`):
251
+ The config object or dictionary of the vision backbone.
252
+ image_token_id (`int`, *optional*, defaults to 151655):
253
+ The image token index to encode the image prompt.
254
+ video_token_id (`int`, *optional*, defaults to 151656):
255
+ The video token index to encode the image prompt.
256
+ """
257
+
258
+ model_type = "llavaonevision1_5"
259
+ sub_configs = {"vision_config": RiceConfig, "text_config": LLaVAOneVision1_5_TextConfig}
260
+ keys_to_ignore_at_inference = ["past_key_values"]
261
+
262
+ def __init__(
263
+ self,
264
+ text_config=None,
265
+ vision_config=None,
266
+ image_token_id=151655,
267
+ video_token_id=151656,
268
+ vocab_size=152064,
269
+ **kwargs,
270
+ ):
271
+ if isinstance(vision_config, dict):
272
+ self.vision_config = self.sub_configs["vision_config"](**vision_config)
273
+ elif vision_config is None:
274
+ self.vision_config = self.sub_configs["vision_config"]()
275
+
276
+ if isinstance(text_config, dict):
277
+ self.text_config = self.sub_configs["text_config"](**text_config)
278
+ elif text_config is None:
279
+ # For BC use all kwargs to init `TextConfig`
280
+ self.text_config = self.sub_configs["text_config"](**kwargs)
281
+
282
+ self.image_token_id = image_token_id
283
+ self.video_token_id = video_token_id
284
+ self.vocab_size = vocab_size
285
+
286
+ super().__init__(**kwargs)
287
+
288
+ __all__ = ["Llavaonevision1_5Config", "LLaVAOneVision1_5_TextConfig"]
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "pad_token_id": 151643,
4
+ "do_sample": true,
5
+ "eos_token_id": 151645,
6
+ "repetition_penalty": 1.05,
7
+ "temperature": 0.000001,
8
+ "_from_model_config": true,
9
+ "transformers_version": "4.53.0"
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f82cd85dce1f7a8438ff40d84a269a0fa99a0e34a848c7353e18aabd462fcdec
3
+ size 4563695792
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d61103678768e54825eb35862e8b98e25db536bb1b4528e1822a50e5491cd286
3
+ size 4919602640
model.safetensors.index.json ADDED
@@ -0,0 +1,705 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 9483221056
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
115
+ "model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
120
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
131
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
132
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
133
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
136
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
137
+ "model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
139
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
140
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
153
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
154
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
155
+ "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
158
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
159
+ "model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
161
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
162
+ "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
163
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
164
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
165
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
166
+ "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.21.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
169
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
170
+ "model.layers.21.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
172
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
173
+ "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
175
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
176
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
177
+ "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.22.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
180
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
181
+ "model.layers.22.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
183
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
184
+ "model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
186
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
187
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
188
+ "model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.23.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
190
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
191
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
192
+ "model.layers.23.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
194
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
195
+ "model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
197
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
198
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
199
+ "model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.24.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
202
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
203
+ "model.layers.24.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
204
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
205
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
206
+ "model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
207
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
208
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
209
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
210
+ "model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
211
+ "model.layers.25.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
212
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
213
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
214
+ "model.layers.25.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
215
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
216
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
218
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
219
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
220
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
221
+ "model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
222
+ "model.layers.26.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
223
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
225
+ "model.layers.26.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
226
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
227
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
228
+ "model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
229
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
230
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
231
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
232
+ "model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
233
+ "model.layers.27.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
234
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
235
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
236
+ "model.layers.27.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
237
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
238
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
239
+ "model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
240
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
241
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
242
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
243
+ "model.layers.28.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
244
+ "model.layers.28.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
245
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
246
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
247
+ "model.layers.28.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
248
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
250
+ "model.layers.29.input_layernorm.weight": "model-00001-of-00002.safetensors",
251
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
252
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
253
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
254
+ "model.layers.29.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
255
+ "model.layers.29.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
256
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
257
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
258
+ "model.layers.29.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
259
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
260
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
261
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
263
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
266
+ "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
268
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
269
+ "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
270
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
271
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.30.input_layernorm.weight": "model-00001-of-00002.safetensors",
273
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
274
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
275
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
276
+ "model.layers.30.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
277
+ "model.layers.30.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
278
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
279
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
280
+ "model.layers.30.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
281
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
282
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
283
+ "model.layers.31.input_layernorm.weight": "model-00001-of-00002.safetensors",
284
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
285
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
286
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
287
+ "model.layers.31.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
288
+ "model.layers.31.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
289
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
290
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
291
+ "model.layers.31.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
292
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
293
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
294
+ "model.layers.32.input_layernorm.weight": "model-00001-of-00002.safetensors",
295
+ "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
296
+ "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
297
+ "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
298
+ "model.layers.32.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
299
+ "model.layers.32.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
300
+ "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
301
+ "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
302
+ "model.layers.32.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
303
+ "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
304
+ "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
305
+ "model.layers.33.input_layernorm.weight": "model-00001-of-00002.safetensors",
306
+ "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
307
+ "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
308
+ "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
309
+ "model.layers.33.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
310
+ "model.layers.33.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
311
+ "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
312
+ "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
313
+ "model.layers.33.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
314
+ "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
315
+ "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
316
+ "model.layers.34.input_layernorm.weight": "model-00001-of-00002.safetensors",
317
+ "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
318
+ "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
319
+ "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
320
+ "model.layers.34.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
321
+ "model.layers.34.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
322
+ "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
323
+ "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
324
+ "model.layers.34.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
325
+ "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
326
+ "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
327
+ "model.layers.35.input_layernorm.weight": "model-00001-of-00002.safetensors",
328
+ "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
329
+ "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
330
+ "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
331
+ "model.layers.35.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
332
+ "model.layers.35.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
333
+ "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
334
+ "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
335
+ "model.layers.35.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
336
+ "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
337
+ "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
338
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
339
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
340
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
341
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
342
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
343
+ "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
344
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
345
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
346
+ "model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
347
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
348
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
349
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
350
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
351
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
352
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
353
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
354
+ "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
355
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
356
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
357
+ "model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
358
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
359
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
360
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
361
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
362
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
363
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
364
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
365
+ "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
366
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
367
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
368
+ "model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
369
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
370
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
371
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
372
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
373
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
374
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
375
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
376
+ "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
377
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
378
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
379
+ "model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
380
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
381
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
382
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
383
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
384
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
385
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
386
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
387
+ "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
388
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
389
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
390
+ "model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
391
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
392
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
393
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
394
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
395
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
396
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
397
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
398
+ "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
399
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
400
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
401
+ "model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
402
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
403
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
404
+ "model.norm.weight": "model-00001-of-00002.safetensors",
405
+ "visual.blocks.0.attn.proj.bias": "model-00002-of-00002.safetensors",
406
+ "visual.blocks.0.attn.proj.weight": "model-00002-of-00002.safetensors",
407
+ "visual.blocks.0.attn.qkv.bias": "model-00002-of-00002.safetensors",
408
+ "visual.blocks.0.attn.qkv.weight": "model-00002-of-00002.safetensors",
409
+ "visual.blocks.0.mlp.fc1.bias": "model-00002-of-00002.safetensors",
410
+ "visual.blocks.0.mlp.fc1.weight": "model-00002-of-00002.safetensors",
411
+ "visual.blocks.0.mlp.fc2.bias": "model-00002-of-00002.safetensors",
412
+ "visual.blocks.0.mlp.fc2.weight": "model-00002-of-00002.safetensors",
413
+ "visual.blocks.0.norm1.bias": "model-00002-of-00002.safetensors",
414
+ "visual.blocks.0.norm1.weight": "model-00002-of-00002.safetensors",
415
+ "visual.blocks.0.norm2.bias": "model-00002-of-00002.safetensors",
416
+ "visual.blocks.0.norm2.weight": "model-00002-of-00002.safetensors",
417
+ "visual.blocks.1.attn.proj.bias": "model-00002-of-00002.safetensors",
418
+ "visual.blocks.1.attn.proj.weight": "model-00002-of-00002.safetensors",
419
+ "visual.blocks.1.attn.qkv.bias": "model-00002-of-00002.safetensors",
420
+ "visual.blocks.1.attn.qkv.weight": "model-00002-of-00002.safetensors",
421
+ "visual.blocks.1.mlp.fc1.bias": "model-00002-of-00002.safetensors",
422
+ "visual.blocks.1.mlp.fc1.weight": "model-00002-of-00002.safetensors",
423
+ "visual.blocks.1.mlp.fc2.bias": "model-00002-of-00002.safetensors",
424
+ "visual.blocks.1.mlp.fc2.weight": "model-00002-of-00002.safetensors",
425
+ "visual.blocks.1.norm1.bias": "model-00002-of-00002.safetensors",
426
+ "visual.blocks.1.norm1.weight": "model-00002-of-00002.safetensors",
427
+ "visual.blocks.1.norm2.bias": "model-00002-of-00002.safetensors",
428
+ "visual.blocks.1.norm2.weight": "model-00002-of-00002.safetensors",
429
+ "visual.blocks.10.attn.proj.bias": "model-00002-of-00002.safetensors",
430
+ "visual.blocks.10.attn.proj.weight": "model-00002-of-00002.safetensors",
431
+ "visual.blocks.10.attn.qkv.bias": "model-00002-of-00002.safetensors",
432
+ "visual.blocks.10.attn.qkv.weight": "model-00002-of-00002.safetensors",
433
+ "visual.blocks.10.mlp.fc1.bias": "model-00002-of-00002.safetensors",
434
+ "visual.blocks.10.mlp.fc1.weight": "model-00002-of-00002.safetensors",
435
+ "visual.blocks.10.mlp.fc2.bias": "model-00002-of-00002.safetensors",
436
+ "visual.blocks.10.mlp.fc2.weight": "model-00002-of-00002.safetensors",
437
+ "visual.blocks.10.norm1.bias": "model-00002-of-00002.safetensors",
438
+ "visual.blocks.10.norm1.weight": "model-00002-of-00002.safetensors",
439
+ "visual.blocks.10.norm2.bias": "model-00002-of-00002.safetensors",
440
+ "visual.blocks.10.norm2.weight": "model-00002-of-00002.safetensors",
441
+ "visual.blocks.11.attn.proj.bias": "model-00002-of-00002.safetensors",
442
+ "visual.blocks.11.attn.proj.weight": "model-00002-of-00002.safetensors",
443
+ "visual.blocks.11.attn.qkv.bias": "model-00002-of-00002.safetensors",
444
+ "visual.blocks.11.attn.qkv.weight": "model-00002-of-00002.safetensors",
445
+ "visual.blocks.11.mlp.fc1.bias": "model-00002-of-00002.safetensors",
446
+ "visual.blocks.11.mlp.fc1.weight": "model-00002-of-00002.safetensors",
447
+ "visual.blocks.11.mlp.fc2.bias": "model-00002-of-00002.safetensors",
448
+ "visual.blocks.11.mlp.fc2.weight": "model-00002-of-00002.safetensors",
449
+ "visual.blocks.11.norm1.bias": "model-00002-of-00002.safetensors",
450
+ "visual.blocks.11.norm1.weight": "model-00002-of-00002.safetensors",
451
+ "visual.blocks.11.norm2.bias": "model-00002-of-00002.safetensors",
452
+ "visual.blocks.11.norm2.weight": "model-00002-of-00002.safetensors",
453
+ "visual.blocks.12.attn.proj.bias": "model-00002-of-00002.safetensors",
454
+ "visual.blocks.12.attn.proj.weight": "model-00002-of-00002.safetensors",
455
+ "visual.blocks.12.attn.qkv.bias": "model-00002-of-00002.safetensors",
456
+ "visual.blocks.12.attn.qkv.weight": "model-00002-of-00002.safetensors",
457
+ "visual.blocks.12.mlp.fc1.bias": "model-00002-of-00002.safetensors",
458
+ "visual.blocks.12.mlp.fc1.weight": "model-00002-of-00002.safetensors",
459
+ "visual.blocks.12.mlp.fc2.bias": "model-00002-of-00002.safetensors",
460
+ "visual.blocks.12.mlp.fc2.weight": "model-00002-of-00002.safetensors",
461
+ "visual.blocks.12.norm1.bias": "model-00002-of-00002.safetensors",
462
+ "visual.blocks.12.norm1.weight": "model-00002-of-00002.safetensors",
463
+ "visual.blocks.12.norm2.bias": "model-00002-of-00002.safetensors",
464
+ "visual.blocks.12.norm2.weight": "model-00002-of-00002.safetensors",
465
+ "visual.blocks.13.attn.proj.bias": "model-00002-of-00002.safetensors",
466
+ "visual.blocks.13.attn.proj.weight": "model-00002-of-00002.safetensors",
467
+ "visual.blocks.13.attn.qkv.bias": "model-00002-of-00002.safetensors",
468
+ "visual.blocks.13.attn.qkv.weight": "model-00002-of-00002.safetensors",
469
+ "visual.blocks.13.mlp.fc1.bias": "model-00002-of-00002.safetensors",
470
+ "visual.blocks.13.mlp.fc1.weight": "model-00002-of-00002.safetensors",
471
+ "visual.blocks.13.mlp.fc2.bias": "model-00002-of-00002.safetensors",
472
+ "visual.blocks.13.mlp.fc2.weight": "model-00002-of-00002.safetensors",
473
+ "visual.blocks.13.norm1.bias": "model-00002-of-00002.safetensors",
474
+ "visual.blocks.13.norm1.weight": "model-00002-of-00002.safetensors",
475
+ "visual.blocks.13.norm2.bias": "model-00002-of-00002.safetensors",
476
+ "visual.blocks.13.norm2.weight": "model-00002-of-00002.safetensors",
477
+ "visual.blocks.14.attn.proj.bias": "model-00002-of-00002.safetensors",
478
+ "visual.blocks.14.attn.proj.weight": "model-00002-of-00002.safetensors",
479
+ "visual.blocks.14.attn.qkv.bias": "model-00002-of-00002.safetensors",
480
+ "visual.blocks.14.attn.qkv.weight": "model-00002-of-00002.safetensors",
481
+ "visual.blocks.14.mlp.fc1.bias": "model-00002-of-00002.safetensors",
482
+ "visual.blocks.14.mlp.fc1.weight": "model-00002-of-00002.safetensors",
483
+ "visual.blocks.14.mlp.fc2.bias": "model-00002-of-00002.safetensors",
484
+ "visual.blocks.14.mlp.fc2.weight": "model-00002-of-00002.safetensors",
485
+ "visual.blocks.14.norm1.bias": "model-00002-of-00002.safetensors",
486
+ "visual.blocks.14.norm1.weight": "model-00002-of-00002.safetensors",
487
+ "visual.blocks.14.norm2.bias": "model-00002-of-00002.safetensors",
488
+ "visual.blocks.14.norm2.weight": "model-00002-of-00002.safetensors",
489
+ "visual.blocks.15.attn.proj.bias": "model-00002-of-00002.safetensors",
490
+ "visual.blocks.15.attn.proj.weight": "model-00002-of-00002.safetensors",
491
+ "visual.blocks.15.attn.qkv.bias": "model-00002-of-00002.safetensors",
492
+ "visual.blocks.15.attn.qkv.weight": "model-00002-of-00002.safetensors",
493
+ "visual.blocks.15.mlp.fc1.bias": "model-00002-of-00002.safetensors",
494
+ "visual.blocks.15.mlp.fc1.weight": "model-00002-of-00002.safetensors",
495
+ "visual.blocks.15.mlp.fc2.bias": "model-00002-of-00002.safetensors",
496
+ "visual.blocks.15.mlp.fc2.weight": "model-00002-of-00002.safetensors",
497
+ "visual.blocks.15.norm1.bias": "model-00002-of-00002.safetensors",
498
+ "visual.blocks.15.norm1.weight": "model-00002-of-00002.safetensors",
499
+ "visual.blocks.15.norm2.bias": "model-00002-of-00002.safetensors",
500
+ "visual.blocks.15.norm2.weight": "model-00002-of-00002.safetensors",
501
+ "visual.blocks.16.attn.proj.bias": "model-00002-of-00002.safetensors",
502
+ "visual.blocks.16.attn.proj.weight": "model-00002-of-00002.safetensors",
503
+ "visual.blocks.16.attn.qkv.bias": "model-00002-of-00002.safetensors",
504
+ "visual.blocks.16.attn.qkv.weight": "model-00002-of-00002.safetensors",
505
+ "visual.blocks.16.mlp.fc1.bias": "model-00002-of-00002.safetensors",
506
+ "visual.blocks.16.mlp.fc1.weight": "model-00002-of-00002.safetensors",
507
+ "visual.blocks.16.mlp.fc2.bias": "model-00002-of-00002.safetensors",
508
+ "visual.blocks.16.mlp.fc2.weight": "model-00002-of-00002.safetensors",
509
+ "visual.blocks.16.norm1.bias": "model-00002-of-00002.safetensors",
510
+ "visual.blocks.16.norm1.weight": "model-00002-of-00002.safetensors",
511
+ "visual.blocks.16.norm2.bias": "model-00002-of-00002.safetensors",
512
+ "visual.blocks.16.norm2.weight": "model-00002-of-00002.safetensors",
513
+ "visual.blocks.17.attn.proj.bias": "model-00002-of-00002.safetensors",
514
+ "visual.blocks.17.attn.proj.weight": "model-00002-of-00002.safetensors",
515
+ "visual.blocks.17.attn.qkv.bias": "model-00002-of-00002.safetensors",
516
+ "visual.blocks.17.attn.qkv.weight": "model-00002-of-00002.safetensors",
517
+ "visual.blocks.17.mlp.fc1.bias": "model-00002-of-00002.safetensors",
518
+ "visual.blocks.17.mlp.fc1.weight": "model-00002-of-00002.safetensors",
519
+ "visual.blocks.17.mlp.fc2.bias": "model-00002-of-00002.safetensors",
520
+ "visual.blocks.17.mlp.fc2.weight": "model-00002-of-00002.safetensors",
521
+ "visual.blocks.17.norm1.bias": "model-00002-of-00002.safetensors",
522
+ "visual.blocks.17.norm1.weight": "model-00002-of-00002.safetensors",
523
+ "visual.blocks.17.norm2.bias": "model-00002-of-00002.safetensors",
524
+ "visual.blocks.17.norm2.weight": "model-00002-of-00002.safetensors",
525
+ "visual.blocks.18.attn.proj.bias": "model-00002-of-00002.safetensors",
526
+ "visual.blocks.18.attn.proj.weight": "model-00002-of-00002.safetensors",
527
+ "visual.blocks.18.attn.qkv.bias": "model-00002-of-00002.safetensors",
528
+ "visual.blocks.18.attn.qkv.weight": "model-00002-of-00002.safetensors",
529
+ "visual.blocks.18.mlp.fc1.bias": "model-00002-of-00002.safetensors",
530
+ "visual.blocks.18.mlp.fc1.weight": "model-00002-of-00002.safetensors",
531
+ "visual.blocks.18.mlp.fc2.bias": "model-00002-of-00002.safetensors",
532
+ "visual.blocks.18.mlp.fc2.weight": "model-00002-of-00002.safetensors",
533
+ "visual.blocks.18.norm1.bias": "model-00002-of-00002.safetensors",
534
+ "visual.blocks.18.norm1.weight": "model-00002-of-00002.safetensors",
535
+ "visual.blocks.18.norm2.bias": "model-00002-of-00002.safetensors",
536
+ "visual.blocks.18.norm2.weight": "model-00002-of-00002.safetensors",
537
+ "visual.blocks.19.attn.proj.bias": "model-00002-of-00002.safetensors",
538
+ "visual.blocks.19.attn.proj.weight": "model-00002-of-00002.safetensors",
539
+ "visual.blocks.19.attn.qkv.bias": "model-00002-of-00002.safetensors",
540
+ "visual.blocks.19.attn.qkv.weight": "model-00002-of-00002.safetensors",
541
+ "visual.blocks.19.mlp.fc1.bias": "model-00002-of-00002.safetensors",
542
+ "visual.blocks.19.mlp.fc1.weight": "model-00002-of-00002.safetensors",
543
+ "visual.blocks.19.mlp.fc2.bias": "model-00002-of-00002.safetensors",
544
+ "visual.blocks.19.mlp.fc2.weight": "model-00002-of-00002.safetensors",
545
+ "visual.blocks.19.norm1.bias": "model-00002-of-00002.safetensors",
546
+ "visual.blocks.19.norm1.weight": "model-00002-of-00002.safetensors",
547
+ "visual.blocks.19.norm2.bias": "model-00002-of-00002.safetensors",
548
+ "visual.blocks.19.norm2.weight": "model-00002-of-00002.safetensors",
549
+ "visual.blocks.2.attn.proj.bias": "model-00002-of-00002.safetensors",
550
+ "visual.blocks.2.attn.proj.weight": "model-00002-of-00002.safetensors",
551
+ "visual.blocks.2.attn.qkv.bias": "model-00002-of-00002.safetensors",
552
+ "visual.blocks.2.attn.qkv.weight": "model-00002-of-00002.safetensors",
553
+ "visual.blocks.2.mlp.fc1.bias": "model-00002-of-00002.safetensors",
554
+ "visual.blocks.2.mlp.fc1.weight": "model-00002-of-00002.safetensors",
555
+ "visual.blocks.2.mlp.fc2.bias": "model-00002-of-00002.safetensors",
556
+ "visual.blocks.2.mlp.fc2.weight": "model-00002-of-00002.safetensors",
557
+ "visual.blocks.2.norm1.bias": "model-00002-of-00002.safetensors",
558
+ "visual.blocks.2.norm1.weight": "model-00002-of-00002.safetensors",
559
+ "visual.blocks.2.norm2.bias": "model-00002-of-00002.safetensors",
560
+ "visual.blocks.2.norm2.weight": "model-00002-of-00002.safetensors",
561
+ "visual.blocks.20.attn.proj.bias": "model-00002-of-00002.safetensors",
562
+ "visual.blocks.20.attn.proj.weight": "model-00002-of-00002.safetensors",
563
+ "visual.blocks.20.attn.qkv.bias": "model-00002-of-00002.safetensors",
564
+ "visual.blocks.20.attn.qkv.weight": "model-00002-of-00002.safetensors",
565
+ "visual.blocks.20.mlp.fc1.bias": "model-00002-of-00002.safetensors",
566
+ "visual.blocks.20.mlp.fc1.weight": "model-00002-of-00002.safetensors",
567
+ "visual.blocks.20.mlp.fc2.bias": "model-00002-of-00002.safetensors",
568
+ "visual.blocks.20.mlp.fc2.weight": "model-00002-of-00002.safetensors",
569
+ "visual.blocks.20.norm1.bias": "model-00002-of-00002.safetensors",
570
+ "visual.blocks.20.norm1.weight": "model-00002-of-00002.safetensors",
571
+ "visual.blocks.20.norm2.bias": "model-00002-of-00002.safetensors",
572
+ "visual.blocks.20.norm2.weight": "model-00002-of-00002.safetensors",
573
+ "visual.blocks.21.attn.proj.bias": "model-00002-of-00002.safetensors",
574
+ "visual.blocks.21.attn.proj.weight": "model-00002-of-00002.safetensors",
575
+ "visual.blocks.21.attn.qkv.bias": "model-00002-of-00002.safetensors",
576
+ "visual.blocks.21.attn.qkv.weight": "model-00002-of-00002.safetensors",
577
+ "visual.blocks.21.mlp.fc1.bias": "model-00002-of-00002.safetensors",
578
+ "visual.blocks.21.mlp.fc1.weight": "model-00002-of-00002.safetensors",
579
+ "visual.blocks.21.mlp.fc2.bias": "model-00002-of-00002.safetensors",
580
+ "visual.blocks.21.mlp.fc2.weight": "model-00002-of-00002.safetensors",
581
+ "visual.blocks.21.norm1.bias": "model-00002-of-00002.safetensors",
582
+ "visual.blocks.21.norm1.weight": "model-00002-of-00002.safetensors",
583
+ "visual.blocks.21.norm2.bias": "model-00002-of-00002.safetensors",
584
+ "visual.blocks.21.norm2.weight": "model-00002-of-00002.safetensors",
585
+ "visual.blocks.22.attn.proj.bias": "model-00002-of-00002.safetensors",
586
+ "visual.blocks.22.attn.proj.weight": "model-00002-of-00002.safetensors",
587
+ "visual.blocks.22.attn.qkv.bias": "model-00002-of-00002.safetensors",
588
+ "visual.blocks.22.attn.qkv.weight": "model-00002-of-00002.safetensors",
589
+ "visual.blocks.22.mlp.fc1.bias": "model-00002-of-00002.safetensors",
590
+ "visual.blocks.22.mlp.fc1.weight": "model-00002-of-00002.safetensors",
591
+ "visual.blocks.22.mlp.fc2.bias": "model-00002-of-00002.safetensors",
592
+ "visual.blocks.22.mlp.fc2.weight": "model-00002-of-00002.safetensors",
593
+ "visual.blocks.22.norm1.bias": "model-00002-of-00002.safetensors",
594
+ "visual.blocks.22.norm1.weight": "model-00002-of-00002.safetensors",
595
+ "visual.blocks.22.norm2.bias": "model-00002-of-00002.safetensors",
596
+ "visual.blocks.22.norm2.weight": "model-00002-of-00002.safetensors",
597
+ "visual.blocks.23.attn.proj.bias": "model-00002-of-00002.safetensors",
598
+ "visual.blocks.23.attn.proj.weight": "model-00002-of-00002.safetensors",
599
+ "visual.blocks.23.attn.qkv.bias": "model-00002-of-00002.safetensors",
600
+ "visual.blocks.23.attn.qkv.weight": "model-00002-of-00002.safetensors",
601
+ "visual.blocks.23.mlp.fc1.bias": "model-00002-of-00002.safetensors",
602
+ "visual.blocks.23.mlp.fc1.weight": "model-00002-of-00002.safetensors",
603
+ "visual.blocks.23.mlp.fc2.bias": "model-00002-of-00002.safetensors",
604
+ "visual.blocks.23.mlp.fc2.weight": "model-00002-of-00002.safetensors",
605
+ "visual.blocks.23.norm1.bias": "model-00002-of-00002.safetensors",
606
+ "visual.blocks.23.norm1.weight": "model-00002-of-00002.safetensors",
607
+ "visual.blocks.23.norm2.bias": "model-00002-of-00002.safetensors",
608
+ "visual.blocks.23.norm2.weight": "model-00002-of-00002.safetensors",
609
+ "visual.blocks.3.attn.proj.bias": "model-00002-of-00002.safetensors",
610
+ "visual.blocks.3.attn.proj.weight": "model-00002-of-00002.safetensors",
611
+ "visual.blocks.3.attn.qkv.bias": "model-00002-of-00002.safetensors",
612
+ "visual.blocks.3.attn.qkv.weight": "model-00002-of-00002.safetensors",
613
+ "visual.blocks.3.mlp.fc1.bias": "model-00002-of-00002.safetensors",
614
+ "visual.blocks.3.mlp.fc1.weight": "model-00002-of-00002.safetensors",
615
+ "visual.blocks.3.mlp.fc2.bias": "model-00002-of-00002.safetensors",
616
+ "visual.blocks.3.mlp.fc2.weight": "model-00002-of-00002.safetensors",
617
+ "visual.blocks.3.norm1.bias": "model-00002-of-00002.safetensors",
618
+ "visual.blocks.3.norm1.weight": "model-00002-of-00002.safetensors",
619
+ "visual.blocks.3.norm2.bias": "model-00002-of-00002.safetensors",
620
+ "visual.blocks.3.norm2.weight": "model-00002-of-00002.safetensors",
621
+ "visual.blocks.4.attn.proj.bias": "model-00002-of-00002.safetensors",
622
+ "visual.blocks.4.attn.proj.weight": "model-00002-of-00002.safetensors",
623
+ "visual.blocks.4.attn.qkv.bias": "model-00002-of-00002.safetensors",
624
+ "visual.blocks.4.attn.qkv.weight": "model-00002-of-00002.safetensors",
625
+ "visual.blocks.4.mlp.fc1.bias": "model-00002-of-00002.safetensors",
626
+ "visual.blocks.4.mlp.fc1.weight": "model-00002-of-00002.safetensors",
627
+ "visual.blocks.4.mlp.fc2.bias": "model-00002-of-00002.safetensors",
628
+ "visual.blocks.4.mlp.fc2.weight": "model-00002-of-00002.safetensors",
629
+ "visual.blocks.4.norm1.bias": "model-00002-of-00002.safetensors",
630
+ "visual.blocks.4.norm1.weight": "model-00002-of-00002.safetensors",
631
+ "visual.blocks.4.norm2.bias": "model-00002-of-00002.safetensors",
632
+ "visual.blocks.4.norm2.weight": "model-00002-of-00002.safetensors",
633
+ "visual.blocks.5.attn.proj.bias": "model-00002-of-00002.safetensors",
634
+ "visual.blocks.5.attn.proj.weight": "model-00002-of-00002.safetensors",
635
+ "visual.blocks.5.attn.qkv.bias": "model-00002-of-00002.safetensors",
636
+ "visual.blocks.5.attn.qkv.weight": "model-00002-of-00002.safetensors",
637
+ "visual.blocks.5.mlp.fc1.bias": "model-00002-of-00002.safetensors",
638
+ "visual.blocks.5.mlp.fc1.weight": "model-00002-of-00002.safetensors",
639
+ "visual.blocks.5.mlp.fc2.bias": "model-00002-of-00002.safetensors",
640
+ "visual.blocks.5.mlp.fc2.weight": "model-00002-of-00002.safetensors",
641
+ "visual.blocks.5.norm1.bias": "model-00002-of-00002.safetensors",
642
+ "visual.blocks.5.norm1.weight": "model-00002-of-00002.safetensors",
643
+ "visual.blocks.5.norm2.bias": "model-00002-of-00002.safetensors",
644
+ "visual.blocks.5.norm2.weight": "model-00002-of-00002.safetensors",
645
+ "visual.blocks.6.attn.proj.bias": "model-00002-of-00002.safetensors",
646
+ "visual.blocks.6.attn.proj.weight": "model-00002-of-00002.safetensors",
647
+ "visual.blocks.6.attn.qkv.bias": "model-00002-of-00002.safetensors",
648
+ "visual.blocks.6.attn.qkv.weight": "model-00002-of-00002.safetensors",
649
+ "visual.blocks.6.mlp.fc1.bias": "model-00002-of-00002.safetensors",
650
+ "visual.blocks.6.mlp.fc1.weight": "model-00002-of-00002.safetensors",
651
+ "visual.blocks.6.mlp.fc2.bias": "model-00002-of-00002.safetensors",
652
+ "visual.blocks.6.mlp.fc2.weight": "model-00002-of-00002.safetensors",
653
+ "visual.blocks.6.norm1.bias": "model-00002-of-00002.safetensors",
654
+ "visual.blocks.6.norm1.weight": "model-00002-of-00002.safetensors",
655
+ "visual.blocks.6.norm2.bias": "model-00002-of-00002.safetensors",
656
+ "visual.blocks.6.norm2.weight": "model-00002-of-00002.safetensors",
657
+ "visual.blocks.7.attn.proj.bias": "model-00002-of-00002.safetensors",
658
+ "visual.blocks.7.attn.proj.weight": "model-00002-of-00002.safetensors",
659
+ "visual.blocks.7.attn.qkv.bias": "model-00002-of-00002.safetensors",
660
+ "visual.blocks.7.attn.qkv.weight": "model-00002-of-00002.safetensors",
661
+ "visual.blocks.7.mlp.fc1.bias": "model-00002-of-00002.safetensors",
662
+ "visual.blocks.7.mlp.fc1.weight": "model-00002-of-00002.safetensors",
663
+ "visual.blocks.7.mlp.fc2.bias": "model-00002-of-00002.safetensors",
664
+ "visual.blocks.7.mlp.fc2.weight": "model-00002-of-00002.safetensors",
665
+ "visual.blocks.7.norm1.bias": "model-00002-of-00002.safetensors",
666
+ "visual.blocks.7.norm1.weight": "model-00002-of-00002.safetensors",
667
+ "visual.blocks.7.norm2.bias": "model-00002-of-00002.safetensors",
668
+ "visual.blocks.7.norm2.weight": "model-00002-of-00002.safetensors",
669
+ "visual.blocks.8.attn.proj.bias": "model-00002-of-00002.safetensors",
670
+ "visual.blocks.8.attn.proj.weight": "model-00002-of-00002.safetensors",
671
+ "visual.blocks.8.attn.qkv.bias": "model-00002-of-00002.safetensors",
672
+ "visual.blocks.8.attn.qkv.weight": "model-00002-of-00002.safetensors",
673
+ "visual.blocks.8.mlp.fc1.bias": "model-00002-of-00002.safetensors",
674
+ "visual.blocks.8.mlp.fc1.weight": "model-00002-of-00002.safetensors",
675
+ "visual.blocks.8.mlp.fc2.bias": "model-00002-of-00002.safetensors",
676
+ "visual.blocks.8.mlp.fc2.weight": "model-00002-of-00002.safetensors",
677
+ "visual.blocks.8.norm1.bias": "model-00002-of-00002.safetensors",
678
+ "visual.blocks.8.norm1.weight": "model-00002-of-00002.safetensors",
679
+ "visual.blocks.8.norm2.bias": "model-00002-of-00002.safetensors",
680
+ "visual.blocks.8.norm2.weight": "model-00002-of-00002.safetensors",
681
+ "visual.blocks.9.attn.proj.bias": "model-00002-of-00002.safetensors",
682
+ "visual.blocks.9.attn.proj.weight": "model-00002-of-00002.safetensors",
683
+ "visual.blocks.9.attn.qkv.bias": "model-00002-of-00002.safetensors",
684
+ "visual.blocks.9.attn.qkv.weight": "model-00002-of-00002.safetensors",
685
+ "visual.blocks.9.mlp.fc1.bias": "model-00002-of-00002.safetensors",
686
+ "visual.blocks.9.mlp.fc1.weight": "model-00002-of-00002.safetensors",
687
+ "visual.blocks.9.mlp.fc2.bias": "model-00002-of-00002.safetensors",
688
+ "visual.blocks.9.mlp.fc2.weight": "model-00002-of-00002.safetensors",
689
+ "visual.blocks.9.norm1.bias": "model-00002-of-00002.safetensors",
690
+ "visual.blocks.9.norm1.weight": "model-00002-of-00002.safetensors",
691
+ "visual.blocks.9.norm2.bias": "model-00002-of-00002.safetensors",
692
+ "visual.blocks.9.norm2.weight": "model-00002-of-00002.safetensors",
693
+ "visual.class_embedding": "model-00002-of-00002.safetensors",
694
+ "visual.class_pos_emb": "model-00002-of-00002.safetensors",
695
+ "visual.merger.ln_q.bias": "model-00002-of-00002.safetensors",
696
+ "visual.merger.ln_q.weight": "model-00002-of-00002.safetensors",
697
+ "visual.merger.mlp.0.bias": "model-00002-of-00002.safetensors",
698
+ "visual.merger.mlp.0.weight": "model-00002-of-00002.safetensors",
699
+ "visual.merger.mlp.2.bias": "model-00002-of-00002.safetensors",
700
+ "visual.merger.mlp.2.weight": "model-00002-of-00002.safetensors",
701
+ "visual.patch_embed.proj.weight": "model-00002-of-00002.safetensors",
702
+ "visual.pre_layernorm.bias": "model-00002-of-00002.safetensors",
703
+ "visual.pre_layernorm.weight": "model-00002-of-00002.safetensors"
704
+ }
705
+ }
modeling_llavaonevision1_5.py ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "max_pixels": 2560000,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "patch_size": 14,
21
+ "processor_class": "Qwen2_5_VLProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 12845056,
26
+ "shortest_edge": 3136
27
+ },
28
+ "temporal_patch_size": 1
29
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tensorboard/instruct/README.md ADDED
File without changes
tensorboard/instruct/events.out.tfevents.1758101239.109436.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:543ddda0e04e35a6bb2e488bfdd641b2f84cbb814bacc29e0936ae780d0b0ab1
3
+ size 92983372
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "processor_class": "Qwen2_5_VLProcessor",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff