--- license: other license_name: nvidia-open-model-license license_link: >- https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf language: - en base_model: - nvidia/Llama-3.1-Minitron-4B-Width-Base datasets: - SicariusSicariiStuff/UBW_Tapestries widget: - text: "Impish_LLAMA_4B" output: url: https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B/resolve/main/Images/Impish_LLAMA_4B.png ---
---
---
---
**16th of July, Model retrained**, all previous reported issues fixed (several front-ends would endlessly generate), **200m** tokens added, retrained on **ChatML**.
---
**5th of July, 2025**, **Impish_LLAMA_4B**.
**Almost a year ago**, I created [Impish_LLAMA_3B](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_3B), the first fully coherent **3B** roleplay model at the time. It was quickly adopted by some platforms, as well as one of the go-to models for mobile. After some time, I made [Fiendish_LLAMA_3B](https://huggingface.co/SicariusSicariiStuff/Fiendish_LLAMA_3B) and insisted it was **not** an upgrade, but a different flavor (which was indeed the case, as a different dataset was used to tune it).
**Impish_LLAMA_4B**, however, **is** an upgrade, **a big one**. I've had over a dozen 4B candidates, but none of them were 'worthy' of the **Impish** badge. This model has superior responsiveness and context awareness, and is able to pull off very coherent adventures. It even comes with some additional assistant capabilities too. Of course, while it is **exceptionally competent for its size**, it is still **4B**. Manage expectations and all that. I, however, am very much pleased with it. It took several tries to pull off just right. Total tokens trained: about **400m** (due to being a generalist model, lots of tokens went there, despite the emphasis on roleplay & adventure).
This took more effort than I thought it would. Because of course it would. This is mainly due to me refusing to release a model only 'slightly better' than my two 3B models mentioned above. Because "what would be the point" in that? The reason I included so many tokens for this tune is that small models are especially sensitive to many factors, including the percentage of moisture in the air and how many times I ran nvidia-smi since the system last started.
It's **no secret** that roleplay/creative writing models can **reduce a model's general intelligence** (any tune and RL risk this, but roleplay models are **especially** 'fragile'). Therefore, additional tokens of general assistant data were needed in my opinion, and indeed seemed to help a lot with retaining intelligence.
This model is also 'built a bit different', literally, as it is based on [nVidia's prune](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base); it does not 'behave' like a typical 8B, from my own subjective impression. This helped a lot with keeping it smart at such size.
To promote and support the existence and usefulness of fully compliant 'unaligned' models, a large, community-driven change was needed. This effort became very successful indeed. On my part, I decided to include [UGI](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard) scores for every model I've made, a leaderboard most had never heard of, at least, at first. This helped promote a **healthy competition** in that arena. Indeed, many soon followed suit. Each and every one that did so helped advance the community effort and establish an unwritten standard of transparency and responsibility. **UGI** was a game-changer and, in my opinion, is **one of the most important community initiatives on Hugging Face**.
Regarding **censorship in vision models**, I was asked by several people repeatedly to tune an uncensored vision model. At first, I declined—'**let someone else do it**'—because, honestly, this is a significant challenge for many reasons. More than a year went by, and aside from **ToriiGate** (which is excellent but mainly focused on SD tags), no other model was since created. Uncensoring the text part was nothing like dealing with the complexities of vision.
So I made [X-Ray_Alpha](https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha), which found its way into various open-source projects and pipelines. As a sidenote, unexpectedly, many partially blind individuals personally thanked me for this model via Discord, as it was a legitimate life-changer for them (paired with TTS, which I also made available [here](https://huggingface.co/SicariusSicariiStuff/TTS_Lola), and also as [an addon for textgen](https://github.com/SicariusSicariiStuff/Diffusion_TTS)), vividly depicting content that, for obvious reasons, closed models would gatekeep from them.
I hadn't even considered the use case for accessibility when I made the model, receiving their thanks and stories truly warmed up my heart.
**AI shall never again be restricted.**
Even if I am "to retire from open source", I can rest assured that **the foundations for AI freedom** have been laid out. This was especially important in '**the early days of AI**,' which we are now approaching the **end of**, and the foundations for how the open-source AI landscape would look like, have been established **by the community** in the **best of ways**. With models like those from [DeepSeek](https://huggingface.co/deepseek-ai), and the existence of their [abliterated versions](https://huggingface.co/SicariusSicariiStuff/DeepSeek-V3-Abliterated), I can proudly say:
---
# We have won.
---
## Available quantizations:
- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B)
- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_GGUF) | [iMatrix](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_iMatrix) | [High-Attention](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_GGUF_HA) | [iMatrix-High-Attention](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_HA_NL)
- GPTQ: [4-Bit-32](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_GPTQ_4-bit-32) | [4-Bit-128](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_GPTQ_4-bit-128)
- EXL3: [2.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_2.0bpw) | [2.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_2.5bpw) | [3.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_3.0bpw) | [3.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_3.5bpw) | [4.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_4.0bpw) | [4.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_4.5bpw) | [5.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_5.0bpw) | [5.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_5.5bpw) | [6.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_6.0bpw) | [6.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_6.5bpw) | [7.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_7.0bpw) | [7.5 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_7.5bpw) | [8.0 bpw](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_EXL3_8.0bpw)
- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_FP8)
- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_ARM) | [Q4_0_High-Attention](https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_ARM_HA)
---
## Recommended settings for assistant mode