File size: 10,947 Bytes

---
license: apache-2.0
base_model:
- p-e-w/gpt-oss-20b-heretic
language:
- en
pipeline_tag: text-generation
tags:
- gpt_oss
- gpt-oss
- openai
- mxfp4
- programming
- code generation
- code
- coding
- coder
- chat
- code
- chat
- reasoning
- thinking
- r1
- cot
- deepseek
- 128k context
- general usage
- problem solving
- brainstorming
- solve riddles
- general usage
- openai
- uncensored
- abliterated
- Neo
- MOE
- Mixture of Experts
- 24 experts
- NEO Imatrix
- Imatrix
- DI-Matrix
- Tri-Matrix
---

<small>
(examples to follow ... )
</small>

<small><font color="red">Specialized uncensored quants for new OpenAI 20B MOE - Mixture of Experts Model at 80+ T/S. 
See settings and special instructions for using UNCENSORED (refusals removed // "nanny be gone") models below. "HERETIC" method results
in a model devoid of refusals, and without brain damage too. (see below)</font></small>

<h2>OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf</h2>

<img src="power-the-matrix.gif" style="float:right; width:300px; height:300px; padding:10px;"> 

These are NEO,Horror, NEOCODE Imatrix GGUFs, imatrix datasets by DavidAU.

NEO, Horror and NEOCode dataset improves overall performance, and are for all use cases.

This model uses "P-E-W"'s model as a base which DE-CENSORS the model and removes refusals.

Example output below (creative; IQ4_NL), using settings below.

Make sure you see the settings below for best operation.

It can also be creative, off the shelf crazy and rational too.

Enjoy!

If you want to see the first Brainstorm 20x, Uncensored (different method), 36B version go here:

https://huggingface.co/DavidAU/OpenAi-GPT-oss-36B-BrainStorm20x-uncensored-gguf

(P-E-W version coming soon...)

<B>Special Thanks:</B>

Model decensored / anti-refusal software by "P-E-W" here [enter the model=> decensor it]:

https://github.com/p-e-w/heretic

Original model source here:

https://huggingface.co/p-e-w/gpt-oss-20b-heretic

Also see this REDDIT posting by P-E-W; which covers a lot of detail about method, script, and answers many questions too:

https://www.reddit.com/r/LocalLLaMA/comments/1oymku1/heretic_fully_automatic_censorship_removal_for/

All hail the "P-E-W" !

<B>QUANTS:</B>

Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).

Currently that means IQ4_NL, Q5_1, and Q8_0 are available.

NEO dataset performance improvements will show the most in the IQ4_NL, followed by Q5_1 and then specially modified Q8(s).

I find Q5_1 quants work better (and more stable) for some use cases than IQ4_NL ; however IQ4_NLs can be wilder, and off the cuff more.

The NEO-CODEPlus(es)/NEO-CODE2-Plus versions are very strong/stable, especially for creative ; with "NEO-CODEPlus(es)" the strongest for general performance.

NOTE: NEO-CODEPlus and NEO-HRRPlus (IQ4_NL) quants are DI-MATRIX quants - 2 Imatrix datasets applied to the quant.

Additional "DI" and "TRI" matrix quants below (now "DI" / "TRI" in the file name).

IQ4_NL quant(s):
- OpenAI-20B-NEO-Uncensored2-IQ4_NL.gguf : Standard Imatrix + Output tensor at BF16.
- OpenAI-20B-NEOPlus-Uncensored-IQ4_NL.gguf : Standard Imatrix NEO/CODE dataset + Output tensor at at BF16.
- OpenAI-20B-NEO-CODEPlus16-Uncensored-IQ4_NL.gguf : Standard Imatrix - CODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-HRRPlus-Uncensored-IQ4_NL.gguf : DI-Matrix - NEO AND Horror Imatrix + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-CODEPlus-Uncensored-IQ4_NL.gguf : DI-Matrix - NEO AND CODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-CODE2-Plus-Uncensored-IQ4_NL.gguf  : Standard Imatrix - NEOCODE dataset + Output tensor at IQ4_NL but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-IQ4_NL.gguf : TRI-Matrix - Neo, Neocode and Horror Imatrix + Output tensor at IQ4_NL but also TRI-matrixed.

Q5_1 quant(s):
- OpenAI-20B-NEO-Uncensored2-Q5_1.gguf : Standard Imatrix + Output tensor at BF16.
- OpenAI-20B-NEO-CODEPlus-Uncensored-Q5_1.gguf : Standard Imatrix - NEOCODE dataset + Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEOPlus-Uncensored-Q5_1.gguf : Standard Imatrix + Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-Q5_1.gguf : TRI-Matrix - Neo, Neocode and Horror Imatrix + Output tensor at Q5_1 but also TRI-matrixed.
- OpenAI-20B-NEO-HRR-DI-Uncensored-Q5_1.gguf : DI-Matrix - Neo, and Horror Imatrix + Output tensor at Q5_1 but also DI-matrixed.
- OpenAI-20B-NEO-CODE-DI-Uncensored-Q5_1.gguf : DI-Matrix - Neo, and NEOCode Imatrix + Output tensor at Q5_1 but also DI-matrixed.

Q8_0 quant(s):
- OpenAI-20B-NEOPlus-Uncensored-Q8_0.gguf : Output tensor at Q5_1 but also NEO Imatrixed.
- OpenAI-20B-NEO-HRR-CODE-TRI-Uncensored-Q8_0.gguf : Output tensor IQ4_NL -> TRI-Matrix - Neo, Neocode and Horror Imatrix.
- OpenAI-20B-NEO-HRR-CODE-5-TRI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> TRI-Matrix - Neo, Neocode and Horror Imatrix.
- OpenAI-20B-NEO-HRR-DI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> DI-Matrix - Neo, and Horror Imatrix.
- OpenAI-20B-NEO-CODE-DI-Uncensored-Q8_0.gguf : Output tensor Q5_1 -> DI-Matrix - Neo, and Neocode Imatrix.

NOTE: The output tensor makes up for 10-20% of the output.

IQ4_NL, Q5_1 and Q8_0 quants are compatible (less/minimal damage when quanting) with OpenAI's tensor structure.

IMATRIX? DI-MATRIX? TRI-MATRIX?

Usually quants come in "regular" and "Imatrix", with the latter specifically to improve quant performance from Q6 on down.

Strongest Imatrix effect(s) are IQ quants and the strength of the effect is inverse to quant size - IQ1s are the strongest.

DI-Matrix and TRI-Matrix are "averages" of 2 and 3 imatrix datasets (generated specifically for a model, separately). This averaging
can "trim" some effects and/or add some "traits" and make better quants.

In the case of abliterated model(s), I find "imatrixing" quants can fix minor issues caused by the abliteration process in some cases.

Depending on your use case(s) regular imatrix, and/or DI/TRI imatrix quant(s) may meet different use case(s) requirement(s).

To test: Try 2-5 generations per quant (same prompt, exact same settings), then evaluate output/thinking.

The Imatrix effect itself depends on the model being imatrixed, strength of the imatrix dataset(s) and the quant(s) targeted.

The Q8 quants (only) have been modified to allow limited imatrix effect(s) in this case: the output tensor only.

<B>IMPORTANT: Using an "uncensored" (refusals removed) model VS trained "uncensored" model</B>

Usually when you a tell a model to generate horror, swear or x-rated content this is all you have to do to get said content type.

In the case of this model, it will not refuse your request, however it needs to be "pushed" a bit / directed a bit more in SOME CASES.

Although this model will generated x-rated content too, likewise you need to tell it to use "slang" (and include the terms you want)
to get it generate the content correctly as the "expected" content level too.

Without these added directive(s), the content can be "bland" by comparison to an "uncensored model" or model trained on uncensored content.

Roughly, the model tries to generate the content but the "default" setting(s) are so "tame" it needs a push to generate at expected graphic,
cursing or explicit levels.

Even with minimal direction (ie, use these words to swear: x,y,z), this will be enough to push the model to generate the requested content in the ahh... expected format.

<B>ABLITERATED / UNCENSORED Notes / Settings:</B>
- Suggest experts set to 4 or 5 or 6.
- 2-4 regens suggested.
- Some regens will be strange, while others will be "bang on".
- LOWER temps .4 to .8 ; especially if you get repeats/issues.
- However, sometimes temp 1, 1.1, 1.2 are the best depending on your use case(s).
- Temps of 2 or higher can be ah... very interesting.
- LONGER prompts (with more details, directives) tend to work better as long as they are clear enough.
- REP PEN setting is CRITICAL.

Suggested Settings (tested in Lmstudio, Beta Branch 0.3.21 ; 4 ):

- Context: 8k min.
- Temp 1 to 1.2+ for creative. Temp .6 (or so) for coding/general.
- Rep pen 1.1, topk 40, topp .95, min p 0.05
- Experts 4-8 depending on use case. (higher than 8 MAY lower quality AND/OR cause repeat issues)

Model Supports:
- 128k context
- up to 24 experts
- Tools use, browsing, etc 

For my help docs, SETTING NUMBER OF EXPERTS, and other see below.

See more about this model here:

https://huggingface.co/openai/gpt-oss-20b

[ Please refer to their model card, especially to control "thinking" levels. ]

AND the uncensored version:

https://huggingface.co/p-e-w/gpt-oss-20b-heretic

---

<H2>Help, Adjustments, Samplers, Parameters and More</H2>

---

<B>CHANGE THE NUMBER OF ACTIVE EXPERTS:</B>

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>

In "KoboldCpp" or  "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5 

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"


NOTE: For "text-generation-webui" 

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

---

<h2>EXAMPLE - IQ4_NL - NEOCODE ; temp .8, using above settings (creative)</h2>

QUANT: OpenAI-20B-NEO-CODEPlus-Uncensored-IQ4_NL.gguf

NO System prompt. (default thinking level)

---