Aratako commited on
Commit
78e6263
·
verified ·
1 Parent(s): e641a78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -54
README.md CHANGED
@@ -1,54 +1,78 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # mixtral-upscaled
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the passthrough merge method.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * ./Mixtral-8x7B-Instruct-v0.1
22
-
23
- ### Configuration
24
-
25
- The following YAML configuration was used to produce this model:
26
-
27
- ```yaml
28
- merge_method: passthrough
29
- slices:
30
- - sources:
31
- - model: ./Mixtral-8x7B-Instruct-v0.1
32
- layer_range: [0, 8]
33
- - sources:
34
- - model: ./Mixtral-8x7B-Instruct-v0.1
35
- layer_range: [4, 12]
36
- - sources:
37
- - model: ./Mixtral-8x7B-Instruct-v0.1
38
- layer_range: [8, 16]
39
- - sources:
40
- - model: ./Mixtral-8x7B-Instruct-v0.1
41
- layer_range: [12, 20]
42
- - sources:
43
- - model: ./Mixtral-8x7B-Instruct-v0.1
44
- layer_range: [16, 24]
45
- - sources:
46
- - model: ./Mixtral-8x7B-Instruct-v0.1
47
- layer_range: [20, 28]
48
- - sources:
49
- - model: ./Mixtral-8x7B-Instruct-v0.1
50
- layer_range: [24, 32]
51
- dtype: bfloat16
52
- tokenizer_source: base
53
-
54
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - mistralai/Mixtral-8x7B-Instruct-v0.1
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ license: apache-2.0
9
+ language:
10
+ - fr
11
+ - it
12
+ - de
13
+ - es
14
+ - en
15
+ ---
16
+ # Mixtral-8x7B-Instruct-v0.1-upscaled
17
+
18
+ This is a frankenmerge of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) created by interleaving layers of itself using [mergekit](https://github.com/cg123/mergekit).
19
+
20
+ ## Benchmark
21
+ The benchmark score of the [mt-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) for this model and the original models are as follows:
22
+
23
+ **1-turn**
24
+ |Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
25
+ |---|---|---|---|---|---|---|---|---|---|---|
26
+ | Mixtral-8x7B-Instruct-v0.1 | 8x7B | 5.3 | **8.5** | **9.9** | **6.8** | 6.0 | 9.1 | 9.55 | 8.9 | 8.00625 |
27
+ | This model | around 8x12B? | **6.3** | 8.4 | **9.9** | 5.4 | **7.7** | **9.2** | **9.75** | **9.8** | **8.30625** |
28
+ ![mt-bench-1turn](./mt-bench-1turn.png)
29
+
30
+ **2-turn**
31
+ |Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
32
+ |---|---|---|---|---|---|---|---|---|---|---|
33
+ | Mixtral-8x7B-Instruct-v0.1 | 8x7B | 4.1 | **8.4** | 9.8 | **4.7** | **5.6** | 9.0 | **9.2** | **9.5** | **7.5375** |
34
+ | This model | around 8x12B? | **4.2** | 7.4 | **9.9** | 4.0 | 5.2 | **9.5** | 8.7 | 8.0 | 7.1125 |
35
+ ![mt-bench-2turn](./mt-bench-2turn.png)
36
+
37
+ ## Merge Details
38
+ ### Merge Method
39
+
40
+ This model was merged using the passthrough merge method.
41
+
42
+ ### Models Merged
43
+
44
+ The following models were included in the merge:
45
+ * mistralai/Mixtral-8x7B-Instruct-v0.1
46
+
47
+ ### Configuration
48
+
49
+ The following YAML configuration was used to produce this model:
50
+
51
+ ```yaml
52
+ merge_method: passthrough
53
+ slices:
54
+ - sources:
55
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
56
+ layer_range: [0, 8]
57
+ - sources:
58
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
59
+ layer_range: [4, 12]
60
+ - sources:
61
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
62
+ layer_range: [8, 16]
63
+ - sources:
64
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
65
+ layer_range: [12, 20]
66
+ - sources:
67
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
68
+ layer_range: [16, 24]
69
+ - sources:
70
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
71
+ layer_range: [20, 28]
72
+ - sources:
73
+ - model: mistralai/Mixtral-8x7B-Instruct-v0.1
74
+ layer_range: [24, 32]
75
+ dtype: bfloat16
76
+ tokenizer_source: base
77
+
78
+ ```