Text Generation
GGUF
English
gpt_oss
gpt-oss
openai
mxfp4
programming
code generation
code
coding
coder
chat
reasoning
thinking
r1
cot
deepseek
128k context
general usage
problem solving
brainstorming
solve riddles
uncensored
abliterated
Neo
MOE
Mixture of Experts
24 experts
NEO Imatrix
Imatrix
DI-Matrix
Tri-Matrix
imatrix
conversational
File size: 10,947 Bytes
727abf1 45c4b34 c4e7b04 45c4b34 d5de3d6 b117ed7 f0b85f9 727abf1 1d8f791 727abf1 1d8f791 727abf1 1d8f791 d5de3d6 3983920 d5de3d6 727abf1 b117ed7 727abf1 599bb4e 727abf1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
---
license: apache-2.0
base_model:
- p-e-w/gpt-oss-20b-heretic
language:
- en
pipeline_tag: text-generation
tags:
- gpt_oss
- gpt-oss
- openai
- mxfp4
- programming
- code generation
- code
- coding
- coder
- chat
- code
- chat
- reasoning
- thinking
- r1
- cot
- deepseek
- 128k context
- general usage
- problem solving
- brainstorming
- solve riddles
- general usage
- openai
- uncensored
- abliterated
- Neo
- MOE
- Mixture of Experts
- 24 experts
- NEO Imatrix
- Imatrix
- DI-Matrix
- Tri-Matrix
---
<small>
(examples to follow ... )
</small>
<small><font color="red">Specialized uncensored quants for new OpenAI 20B MOE - Mixture of Experts Model at 80+ T/S.
See settings and special instructions for using UNCENSORED (refusals removed // "nanny be gone") models below. "HERETIC" method results
in a model devoid of refusals, and without brain damage too. (see below)</font></small>
<h2>OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf</h2>
<img src="power-the-matrix.gif" style="float:right; width:300px; height:300px; padding:10px;">
These are NEO,Horror, NEOCODE Imatrix GGUFs, imatrix datasets by DavidAU.
NEO, Horror and NEOCode dataset improves overall performance, and are for all use cases.
This model uses "P-E-W"'s model as a base which DE-CENSORS the model and removes refusals.
Example output below (creative; IQ4_NL), using settings below.
Make sure you see the settings below for best operation.
It can also be creative, off the shelf crazy and rational too.
Enjoy!
If you want to see the first Brainstorm 20x, Uncensored (different method), 36B version go here:
https://huggingface.co/DavidAU/OpenAi-GPT-oss-36B-BrainStorm20x-uncensored-gguf
(P-E-W version coming soon...)
<B>Special Thanks:</B>
Model decensored / anti-refusal software by "P-E-W" here [enter the model=> decensor it]:
https://github.com/p-e-w/heretic
Original model source here:
https://huggingface.co/p-e-w/gpt-oss-20b-heretic
Also see this REDDIT posting by P-E-W; which covers a lot of detail about method, script, and answers many questions too:
https://www.reddit.com/r/LocalLLaMA/comments/1oymku1/heretic_fully_automatic_censorship_removal_for/
All hail the "P-E-W" !
<B>QUANTS:</B>
Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).
Currently that means IQ4_NL, Q5_1, and Q8_0 are available.
NEO dataset performance improvements will show the most in the IQ4_NL, followed by Q5_1 and then specially modified Q8(s).
I find Q5_1 quants work better (and more stable) for some use cases than IQ4_NL ; however IQ4_NLs can be wilder, and off the cuff more.
The NEO-CODEPlus(es)/NEO-CODE2-Plus versions are very strong/stable, especially for creative ; with "NEO-CODEPlus(es)" the strongest for general performance.
NOTE: NEO-CODEPlus and NEO-HRRPlus (IQ4_NL) quants are DI-MATRIX quants - 2 Imatrix datasets applied to the quant.
Additional "DI" and "TRI" matrix quants below (now "DI" / "TRI" in the file name).
IQ4_NL quant(s):
- OpenAI-20B-NEO-Uncensored2-IQ4_NL.gguf : Standard Imatrix + Output tensor at BF16.
- OpenAI-20B-NEOPlus-Uncensored-IQ4_NL.gguf : Standard Imatrix NEO/CODE dataset + Output tensor at at BF16.
- OpenAI-20B-NEO-CODEPlus16-Uncensored-IQ4_NL.gguf : Standard Imatrix - CODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-HRRPlus-Uncensored-IQ4_NL.gguf : DI-Matrix - NEO AND Horror Imatrix + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-CODEPlus-Uncensored-IQ4_NL.gguf : DI-Matrix - NEO AND CODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-CODE2-Plus-Uncensored-IQ4_NL.gguf : Standard Imatrix - NEOCODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-IQ4_NL.gguf : TRI-Matrix - Neo, Neocode and Horror Imatrix + Output tensor at IQ4_NL but also TRI-matrixed.
Q5_1 quant(s):
- OpenAI-20B-NEO-Uncensored2-Q5_1.gguf : Standard Imatrix + Output tensor at BF16.
- OpenAI-20B-NEO-CODEPlus-Uncensored-Q5_1.gguf : Standard Imatrix - NEOCODE dataset + Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEOPlus-Uncensored-Q5_1.gguf : Standard Imatrix + Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-Q5_1.gguf : TRI-Matrix - Neo, Neocode and Horror Imatrix + Output tensor at Q5_1 but also TRI-matrixed.
- OpenAI-20B-NEO-HRR-DI-Uncensored-Q5_1.gguf : DI-Matrix - Neo, and Horror Imatrix + Output tensor at Q5_1 but also DI-matrixed.
- OpenAI-20B-NEO-CODE-DI-Uncensored-Q5_1.gguf : DI-Matrix - Neo, and NEOCode Imatrix + Output tensor at Q5_1 but also DI-matrixed.
Q8_0 quant(s):
- OpenAI-20B-NEOPlus-Uncensored-Q8_0.gguf : Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-Q8_0.gguf : Output tensor IQ4_NL -> TRI-Matrix - Neo, Neocode and Horror Imatrix.
- OpenAI-20B-NEO-HRR-CODE-5-TRI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> TRI-Matrix - Neo, Neocode and Horror Imatrix.
- OpenAI-20B-NEO-HRR-DI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> DI-Matrix - Neo, and Horror Imatrix.
- OpenAI-20B-NEO-CODE-DI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> DI-Matrix - Neo, and Neocode Imatrix.
NOTE: The output tensor makes up for 10-20% of the output.
IQ4_NL, Q5_1 and Q8_0 quants are compatible (less/minimal damage when quanting) with OpenAI's tensor structure.
IMATRIX? DI-MATRIX? TRI-MATRIX?
Usually quants come in "regular" and "Imatrix", with the latter specifically to improve quant performance from Q6 on down.
Strongest Imatrix effect(s) are IQ quants and the strength of the effect is inverse to quant size - IQ1s are the strongest.
DI-Matrix and TRI-Matrix are "averages" of 2 and 3 imatrix datasets (generated specifically for a model, separately). This averaging
can "trim" some effects and/or add some "traits" and make better quants.
In the case of abliterated model(s), I find "imatrixing" quants can fix minor issues caused by the abliteration process in some cases.
Depending on your use case(s) regular imatrix, and/or DI/TRI imatrix quant(s) may meet different use case(s) requirement(s).
To test: Try 2-5 generations per quant (same prompt, exact same settings), then evaluate output/thinking.
The Imatrix effect itself depends on the model being imatrixed, strength of the imatrix dataset(s) and the quant(s) targeted.
The Q8 quants (only) have been modified to allow limited imatrix effect(s) in this case: the output tensor only.
<B>IMPORTANT: Using an "uncensored" (refusals removed) model VS trained "uncensored" model</B>
Usually when you a tell a model to generate horror, swear or x-rated content this is all you have to do to get said content type.
In the case of this model, it will not refuse your request, however it needs to be "pushed" a bit / directed a bit more in SOME CASES.
Although this model will generated x-rated content too, likewise you need to tell it to use "slang" (and include the terms you want)
to get it generate the content correctly as the "expected" content level too.
Without these added directive(s), the content can be "bland" by comparison to an "uncensored model" or model trained on uncensored content.
Roughly, the model tries to generate the content but the "default" setting(s) are so "tame" it needs a push to generate at expected graphic,
cursing or explicit levels.
Even with minimal direction (ie, use these words to swear: x,y,z), this will be enough to push the model to generate the requested content in the ahh... expected format.
<B>ABLITERATED / UNCENSORED Notes / Settings:</B>
- Suggest experts set to 4 or 5 or 6.
- 2-4 regens suggested.
- Some regens will be strange, while others will be "bang on".
- LOWER temps .4 to .8 ; especially if you get repeats/issues.
- However, sometimes temp 1, 1.1, 1.2 are the best depending on your use case(s).
- Temps of 2 or higher can be ah... very interesting.
- LONGER prompts (with more details, directives) tend to work better as long as they are clear enough.
- REP PEN setting is CRITICAL.
Suggested Settings (tested in Lmstudio, Beta Branch 0.3.21 ; 4 ):
- Context: 8k min.
- Temp 1 to 1.2+ for creative. Temp .6 (or so) for coding/general.
- Rep pen 1.1, topk 40, topp .95, min p 0.05
- Experts 4-8 depending on use case. (higher than 8 MAY lower quality AND/OR cause repeat issues)
Model Supports:
- 128k context
- up to 24 experts
- Tools use, browsing, etc
For my help docs, SETTING NUMBER OF EXPERTS, and other see below.
See more about this model here:
https://huggingface.co/openai/gpt-oss-20b
[ Please refer to their model card, especially to control "thinking" levels. ]
AND the uncensored version:
https://huggingface.co/p-e-w/gpt-oss-20b-heretic
---
<H2>Help, Adjustments, Samplers, Parameters and More</H2>
---
<B>CHANGE THE NUMBER OF ACTIVE EXPERTS:</B>
See this document:
https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts
<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>
In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;
Set the "Smoothing_factor" to 1.5
: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
: in text-generation-webui -> parameters -> lower right.
: In Silly Tavern this is called: "Smoothing"
NOTE: For "text-generation-webui"
-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
Source versions (and config files) of my models are here:
https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be
OTHER OPTIONS:
- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>
This a "Class 1" model:
For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
---
<h2>EXAMPLE - IQ4_NL - NEOCODE ; temp .8, using above settings (creative)</h2>
QUANT: OpenAI-20B-NEO-CODEPlus-Uncensored-IQ4_NL.gguf
NO System prompt. (default thinking level)
---
|