Text Generation
GGUF
English
Prototype
8X3B MOE
mixture of experts
reasoning
thinking
thoughts
deepseek
Mixture of Experts
context 128k
Llama 3.2 MOE
creative
creative writing
general usage
problem solving
brainstorming
solve riddles
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
story
writing
fiction
roleplaying
llama 3.2
mergekit
Merge
llama-3
llama-3.2
conversational
Update README.md
Browse files
README.md
CHANGED
|
@@ -45,17 +45,17 @@ pipeline_tag: text-generation
|
|
| 45 |
|
| 46 |
<img src="thought.jpg" style="float:right; width:300px; height:300px; padding:5px;">
|
| 47 |
|
| 48 |
-
This as a 8X3B, Mixture of Experts model with 4/8 experts (
|
| 49 |
Reasoning tech installed (in each one) giving you a 24B (8X3B) parameter model in only 18.4B model size.
|
| 50 |
|
| 51 |
-
This model is a composed of EIGHT finetuned Llama 3.2 3B models for reasoning/
|
| 52 |
|
| 53 |
This model can be used for creative, non-creative use cases and general usage.
|
| 54 |
|
| 55 |
Three example prompts with output posted at the bottom of this page.
|
| 56 |
|
| 57 |
-
This is a very stable model, which can operate at temps 1+ 2+ and higher and generate coherent thought(s) and exceeds
|
| 58 |
-
many other "thinking models" in terms of performance, coherence and depth of thought.
|
| 59 |
|
| 60 |
You can select/set the number of experts to use from 1 to 8.
|
| 61 |
|
|
@@ -75,7 +75,7 @@ See "Experts Activation" to adjust the number of experts used from 1 to 8 with t
|
|
| 75 |
|
| 76 |
PROTOTYPE NOTES:
|
| 77 |
|
| 78 |
-
1. This model may go "on and on" in some cases. Set your context to at least 8k,
|
| 79 |
2. Sometimes the model will be "all thought" and "no action" - in this case , stop the generation and tell the model to "execute the plan."
|
| 80 |
3. Feel free to really go "wild" with temp with this model, especially creative use cases.
|
| 81 |
4. The models selected are designed for problem solving and deep thinking/reasoning.
|
|
|
|
| 45 |
|
| 46 |
<img src="thought.jpg" style="float:right; width:300px; height:300px; padding:5px;">
|
| 47 |
|
| 48 |
+
This as a 8X3B, Mixture of Experts model with 4/8 experts (8 Llama 3.2 fine tunes) activated, all with
|
| 49 |
Reasoning tech installed (in each one) giving you a 24B (8X3B) parameter model in only 18.4B model size.
|
| 50 |
|
| 51 |
+
This model is a composed of EIGHT finetuned Llama 3.2 3B models for reasoning/thoughts.
|
| 52 |
|
| 53 |
This model can be used for creative, non-creative use cases and general usage.
|
| 54 |
|
| 55 |
Three example prompts with output posted at the bottom of this page.
|
| 56 |
|
| 57 |
+
This is a very stable model, which can operate at temps 1+ 2+ and higher and generate coherent thought(s) and exceeds
|
| 58 |
+
many other "thinking models" in terms of performance, coherence and depth of thought - including long train of thought reasoning.
|
| 59 |
|
| 60 |
You can select/set the number of experts to use from 1 to 8.
|
| 61 |
|
|
|
|
| 75 |
|
| 76 |
PROTOTYPE NOTES:
|
| 77 |
|
| 78 |
+
1. This model may go "on and on" in some cases. Set your context to at least 8k, 12k to 16k is better as the model can easily output 12k+ in thoughts.
|
| 79 |
2. Sometimes the model will be "all thought" and "no action" - in this case , stop the generation and tell the model to "execute the plan."
|
| 80 |
3. Feel free to really go "wild" with temp with this model, especially creative use cases.
|
| 81 |
4. The models selected are designed for problem solving and deep thinking/reasoning.
|