Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,8 @@ tags:
|
|
| 6 |
- mergekit
|
| 7 |
- merge
|
| 8 |
license: apache-2.0
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
# merge
|
| 11 |
|
|
@@ -90,7 +92,7 @@ This resulted in the creation of [BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE](ht
|
|
| 90 |
5. **Efficient Model Scaling**:
|
| 91 |
Sparse upcycling allows for **scaling** the model in a way that adds capacity without a proportional increase in computation costs. This is ideal for applications where high capacity is needed, but hardware or energy constraints make it impractical to compute all parameters for every input.
|
| 92 |
|
| 93 |
-
Resultantly, this model went from 8.83B parameters to
|
| 94 |
- **8 experts**: Each expert is another copy of the base model (9B each), leading to \( 8 \times 9B = 72B \) additional
|
| 95 |
parameters. The total number of parameters is therefore around **81B** (9B + 72B). However, since only a fraction of the experts are used for each token (due to sparse activation), the model does not use all 54.3B parameters during every inference step. This makes the model scalable without requiring computation over the entire parameter space every time.
|
| 96 |
- **Sparse Upcycling and Gating Mechanism**:
|
|
@@ -113,15 +115,35 @@ Our innovative training regimen involved configuring distinct experts within the
|
|
| 113 |
|
| 114 |
By integrating advanced tools and technologies such as Docker, Kubernetes, and Prometheus within the prompts, the experts were trained to produce code aligned with modern DevOps practices. The merge methods combined the strengths of the individual models while mitigating their limitations, resulting in a model capable of handling a wide array of complex coding tasks across multiple programming languages.
|
| 115 |
|
| 116 |
-
Model Tree and Resources
|
| 117 |
|
| 118 |
The development process can be visualized through the following model hierarchy:
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
01-ai/Yi-Coder-9B (Base Model)
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
|
| 127 |
### Configuration
|
|
@@ -413,18 +435,16 @@ experts:
|
|
| 413 |
- Maintainability and Scalability: Promotes the creation of modular, scalable, and maintainable codebases, essential for large-scale applications.
|
| 414 |
|
| 415 |
Coming Soon:
|
| 416 |
-
BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0-DELLA (DELLA pruned version of this Final Integrated Model)
|
|
|
|
| 417 |
|
| 418 |
#### Intended Outcomes
|
| 419 |
|
| 420 |
The primary objectives of this project are:
|
| 421 |
|
| 422 |
- `Versatility and Reliability`: Develop a highly versatile MoE model capable of generating production-ready code across numerous programming languages and complex scenarios.
|
| 423 |
-
|
| 424 |
- `Enhanced Efficiency`: Utilize sparse activation and the MoE architecture to increase model capacity without a proportional increase in computational costs.
|
| 425 |
-
|
| 426 |
- `Code Quality and Security`: Ensure the generated code adheres to industry best practices, is secure, and is optimized for performance and scalability.
|
| 427 |
-
|
| 428 |
- `Accelerated Development`: Reduce the time and effort required for software development by providing comprehensive, deployable code snippets that integrate seamlessly into various development pipelines.
|
| 429 |
|
| 430 |
#### Conclusion
|
|
|
|
| 6 |
- mergekit
|
| 7 |
- merge
|
| 8 |
license: apache-2.0
|
| 9 |
+
metrics:
|
| 10 |
+
- code_eval
|
| 11 |
---
|
| 12 |
# merge
|
| 13 |
|
|
|
|
| 92 |
5. **Efficient Model Scaling**:
|
| 93 |
Sparse upcycling allows for **scaling** the model in a way that adds capacity without a proportional increase in computation costs. This is ideal for applications where high capacity is needed, but hardware or energy constraints make it impractical to compute all parameters for every input.
|
| 94 |
|
| 95 |
+
Resultantly, this model went from 8.83B parameters to 54.3B parameters:
|
| 96 |
- **8 experts**: Each expert is another copy of the base model (9B each), leading to \( 8 \times 9B = 72B \) additional
|
| 97 |
parameters. The total number of parameters is therefore around **81B** (9B + 72B). However, since only a fraction of the experts are used for each token (due to sparse activation), the model does not use all 54.3B parameters during every inference step. This makes the model scalable without requiring computation over the entire parameter space every time.
|
| 98 |
- **Sparse Upcycling and Gating Mechanism**:
|
|
|
|
| 115 |
|
| 116 |
By integrating advanced tools and technologies such as Docker, Kubernetes, and Prometheus within the prompts, the experts were trained to produce code aligned with modern DevOps practices. The merge methods combined the strengths of the individual models while mitigating their limitations, resulting in a model capable of handling a wide array of complex coding tasks across multiple programming languages.
|
| 117 |
|
| 118 |
+
#### Model Tree and Resources
|
| 119 |
|
| 120 |
The development process can be visualized through the following model hierarchy:
|
| 121 |
|
| 122 |
+
1. `01-ai/Yi-Coder-9B` (Base Model)
|
| 123 |
+
2. `01-ai/Yi-Coder-9B-Chat` (Chat-Fine-Tuned Model)
|
| 124 |
+
3. `BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES` (Result of TIES Merge)
|
| 125 |
+
4. `BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE` (Result of Sparse Upcycling Merge)
|
| 126 |
+
5. `BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0` (Final Integrated (This)Model)
|
| 127 |
+
|
| 128 |
+
````
|
| 129 |
+
Yi-Coder-9B Model Family Tree
|
| 130 |
+
=============================
|
| 131 |
+
|
| 132 |
01-ai/Yi-Coder-9B (Base Model)
|
| 133 |
+
β
|
| 134 |
+
βββ 01-ai/Yi-Coder-9B-Chat (Fine-Tuned Model)
|
| 135 |
+
β β
|
| 136 |
+
β βββ BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES (TIES Merged Model)
|
| 137 |
+
β β β
|
| 138 |
+
β β βββ BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
|
| 139 |
+
β β
|
| 140 |
+
β βββ BenevolenceMessiah/Yi-Coder-9B-Chat-8x-MoE (Sparse Upcycling Merge)
|
| 141 |
+
β β
|
| 142 |
+
β βββ BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
|
| 143 |
+
β
|
| 144 |
+
βββ BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0 (Final Integrated Model)
|
| 145 |
+
|
| 146 |
+
````
|
| 147 |
|
| 148 |
|
| 149 |
### Configuration
|
|
|
|
| 435 |
- Maintainability and Scalability: Promotes the creation of modular, scalable, and maintainable codebases, essential for large-scale applications.
|
| 436 |
|
| 437 |
Coming Soon:
|
| 438 |
+
- BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v1.0-DELLA (DELLA pruned version of this Final Integrated Model)
|
| 439 |
+
- BenevolenceMessiah/Yi-Coder-9B-Chat-Instruct-TIES-MoE-v2.X (Can we get any experimental? You bet! Did we break down the barriers and mechanics between training, finetuning, and merging?? Maybe...)
|
| 440 |
|
| 441 |
#### Intended Outcomes
|
| 442 |
|
| 443 |
The primary objectives of this project are:
|
| 444 |
|
| 445 |
- `Versatility and Reliability`: Develop a highly versatile MoE model capable of generating production-ready code across numerous programming languages and complex scenarios.
|
|
|
|
| 446 |
- `Enhanced Efficiency`: Utilize sparse activation and the MoE architecture to increase model capacity without a proportional increase in computational costs.
|
|
|
|
| 447 |
- `Code Quality and Security`: Ensure the generated code adheres to industry best practices, is secure, and is optimized for performance and scalability.
|
|
|
|
| 448 |
- `Accelerated Development`: Reduce the time and effort required for software development by providing comprehensive, deployable code snippets that integrate seamlessly into various development pipelines.
|
| 449 |
|
| 450 |
#### Conclusion
|