DavidAU
/

DeepThought-MOE-8X3B-R1-Llama-3.2-Reasoning-18B-gguf

Text Generation

mixture of experts

Mixture of Experts

creative writing

problem solving

fiction writing

plot generation

sub-plot generation

story generation

Model card Files Files and versions

DavidAU commited on Feb 16

Commit

9008d3e

·

verified ·

1 Parent(s): 5a18a82

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -45,17 +45,17 @@ pipeline_tag: text-generation
 <img src="thought.jpg" style="float:right; width:300px; height:300px; padding:5px;">
-This as a 8X3B, Mixture of Experts model with 4/8 experts (4 Llama fine tunes) activated, all with Deepseek
 Reasoning tech installed (in each one) giving you a 24B (8X3B) parameter model in only 18.4B model size.
-This model is a composed of EIGHT finetuned Llama 3.2 3B models for reasoning/thought.
 This model can be used for creative, non-creative use cases and general usage.
 Three example prompts with output posted at the bottom of this page.
-This is a very stable model, which can operate at temps 1+ 2+ and higher and generate coherent thought(s) and exceeds the original
-many other "thinking models" in terms of performance, coherence and depth of thought.
 You can select/set the number of experts to use from 1 to 8.
@@ -75,7 +75,7 @@ See "Experts Activation" to adjust the number of experts used from 1 to 8 with t
 PROTOTYPE NOTES:
-1. This model may go "on and on" in some cases. Set your context to at least 8k, 12 to 16 is better.
 2. Sometimes the model will be "all thought" and "no action" - in this case , stop the generation and tell the model to "execute the plan."
 3. Feel free to really go "wild" with temp with this model, especially creative use cases.
 4. The models selected are designed for problem solving and deep thinking/reasoning.

 <img src="thought.jpg" style="float:right; width:300px; height:300px; padding:5px;">
+This as a 8X3B, Mixture of Experts model with 4/8 experts (8 Llama 3.2 fine tunes) activated, all with
 Reasoning tech installed (in each one) giving you a 24B (8X3B) parameter model in only 18.4B model size.
+This model is a composed of EIGHT finetuned Llama 3.2 3B models for reasoning/thoughts.
 This model can be used for creative, non-creative use cases and general usage.
 Three example prompts with output posted at the bottom of this page.
+This is a very stable model, which can operate at temps 1+ 2+ and higher and generate coherent thought(s) and exceeds
+many other "thinking models" in terms of performance, coherence and depth of thought - including long train of thought reasoning.
 You can select/set the number of experts to use from 1 to 8.
 PROTOTYPE NOTES:
+1. This model may go "on and on" in some cases. Set your context to at least 8k, 12k to 16k is better as the model can easily output 12k+ in thoughts.
 2. Sometimes the model will be "all thought" and "no action" - in this case , stop the generation and tell the model to "execute the plan."
 3. Feel free to really go "wild" with temp with this model, especially creative use cases.
 4. The models selected are designed for problem solving and deep thinking/reasoning.