BenevolenceMessiah commited on
Commit
d90f6e3
Β·
verified Β·
1 Parent(s): 54ea61e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -10
README.md CHANGED
@@ -6,6 +6,8 @@ tags:
6
  - mergekit
7
  - merge
8
  license: apache-2.0
 
 
9
  ---
10
  # merge
11
 
@@ -90,7 +92,7 @@ This resulted in the creation of [BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE](ht
90
  5. **Efficient Model Scaling**:
91
  Sparse upcycling allows for **scaling** the model in a way that adds capacity without a proportional increase in computation costs. This is ideal for applications where high capacity is needed, but hardware or energy constraints make it impractical to compute all parameters for every input.
92
 
93
- Resultantly, this model went from 8.83B parameters to 53.B parameters:
94
  - **8 experts**: Each expert is another copy of the base model (9B each), leading to \( 8 \times 9B = 72B \) additional
95
  parameters. The total number of parameters is therefore around **81B** (9B + 72B). However, since only a fraction of the experts are used for each token (due to sparse activation), the model does not use all 54.3B parameters during every inference step. This makes the model scalable without requiring computation over the entire parameter space every time.
96
  - **Sparse Upcycling and Gating Mechanism**:
@@ -113,15 +115,35 @@ Our innovative training regimen involved configuring distinct experts within the
113
 
114
  By integrating advanced tools and technologies such as Docker, Kubernetes, and Prometheus within the prompts, the experts were trained to produce code aligned with modern DevOps practices. The merge methods combined the strengths of the individual models while mitigating their limitations, resulting in a model capable of handling a wide array of complex coding tasks across multiple programming languages.
115
 
116
- Model Tree and Resources
117
 
118
  The development process can be visualized through the following model hierarchy:
119
 
 
 
 
 
 
 
 
 
 
 
120
  01-ai/Yi-Coder-9B (Base Model)
121
- 01-ai/Yi-Coder-9B-Chat (Chat-Fine-Tuned Model)
122
- BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES (Result of TIES Merge)
123
- BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE (Result of Sparse Upcycling Merge)
124
- BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
 
 
 
 
 
 
 
 
 
 
125
 
126
 
127
  ### Configuration
@@ -413,18 +435,16 @@ experts:
413
  - Maintainability and Scalability: Promotes the creation of modular, scalable, and maintainable codebases, essential for large-scale applications.
414
 
415
  Coming Soon:
416
- BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0-DELLA (DELLA pruned version of this Final Integrated Model)
 
417
 
418
  #### Intended Outcomes
419
 
420
  The primary objectives of this project are:
421
 
422
  - `Versatility and Reliability`: Develop a highly versatile MoE model capable of generating production-ready code across numerous programming languages and complex scenarios.
423
-
424
  - `Enhanced Efficiency`: Utilize sparse activation and the MoE architecture to increase model capacity without a proportional increase in computational costs.
425
-
426
  - `Code Quality and Security`: Ensure the generated code adheres to industry best practices, is secure, and is optimized for performance and scalability.
427
-
428
  - `Accelerated Development`: Reduce the time and effort required for software development by providing comprehensive, deployable code snippets that integrate seamlessly into various development pipelines.
429
 
430
  #### Conclusion
 
6
  - mergekit
7
  - merge
8
  license: apache-2.0
9
+ metrics:
10
+ - code_eval
11
  ---
12
  # merge
13
 
 
92
  5. **Efficient Model Scaling**:
93
  Sparse upcycling allows for **scaling** the model in a way that adds capacity without a proportional increase in computation costs. This is ideal for applications where high capacity is needed, but hardware or energy constraints make it impractical to compute all parameters for every input.
94
 
95
+ Resultantly, this model went from 8.83B parameters to 54.3B parameters:
96
  - **8 experts**: Each expert is another copy of the base model (9B each), leading to \( 8 \times 9B = 72B \) additional
97
  parameters. The total number of parameters is therefore around **81B** (9B + 72B). However, since only a fraction of the experts are used for each token (due to sparse activation), the model does not use all 54.3B parameters during every inference step. This makes the model scalable without requiring computation over the entire parameter space every time.
98
  - **Sparse Upcycling and Gating Mechanism**:
 
115
 
116
  By integrating advanced tools and technologies such as Docker, Kubernetes, and Prometheus within the prompts, the experts were trained to produce code aligned with modern DevOps practices. The merge methods combined the strengths of the individual models while mitigating their limitations, resulting in a model capable of handling a wide array of complex coding tasks across multiple programming languages.
117
 
118
+ #### Model Tree and Resources
119
 
120
  The development process can be visualized through the following model hierarchy:
121
 
122
+ 1. `01-ai/Yi-Coder-9B` (Base Model)
123
+ 2. `01-ai/Yi-Coder-9B-Chat` (Chat-Fine-Tuned Model)
124
+ 3. `BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES` (Result of TIES Merge)
125
+ 4. `BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE` (Result of Sparse Upcycling Merge)
126
+ 5. `BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0` (Final Integrated (This)Model)
127
+
128
+ ````
129
+ Yi-Coder-9B Model Family Tree
130
+ =============================
131
+
132
  01-ai/Yi-Coder-9B (Base Model)
133
+ β”‚
134
+ β”œβ”€β”€ 01-ai/Yi-Coder-9B-Chat (Fine-Tuned Model)
135
+ β”‚ β”‚
136
+ β”‚ β”œβ”€β”€ BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES (TIES Merged Model)
137
+ β”‚ β”‚ β”‚
138
+ β”‚ β”‚ └── BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
139
+ β”‚ β”‚
140
+ β”‚ └── BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE (Sparse Upcycling Merge)
141
+ β”‚ β”‚
142
+ β”‚ └── BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
143
+ β”‚
144
+ └── BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
145
+
146
+ ````
147
 
148
 
149
  ### Configuration
 
435
  - Maintainability and Scalability: Promotes the creation of modular, scalable, and maintainable codebases, essential for large-scale applications.
436
 
437
  Coming Soon:
438
+ - BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0-DELLA (DELLA pruned version of this Final Integrated Model)
439
+ - BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v2.X (Can we get any experimental? You bet! Did we break down the barriers and mechanics between training, finetuning, and merging?? Maybe...)
440
 
441
  #### Intended Outcomes
442
 
443
  The primary objectives of this project are:
444
 
445
  - `Versatility and Reliability`: Develop a highly versatile MoE model capable of generating production-ready code across numerous programming languages and complex scenarios.
 
446
  - `Enhanced Efficiency`: Utilize sparse activation and the MoE architecture to increase model capacity without a proportional increase in computational costs.
 
447
  - `Code Quality and Security`: Ensure the generated code adheres to industry best practices, is secure, and is optimized for performance and scalability.
 
448
  - `Accelerated Development`: Reduce the time and effort required for software development by providing comprehensive, deployable code snippets that integrate seamlessly into various development pipelines.
449
 
450
  #### Conclusion