Llama-3.1-Tulu-3.1-8B-abliterated

This model is an abliterated version of the original model from the Allen Institute for AI (AI2). Abliteration is a technique to remove or suppress the model's refusal behaviors, making it more compliant and less censored in responding to a wide range of prompts, including those that might be considered harmful or sensitive by the original model. This process identifies and nullifies the "refusal direction" in the model's activations without requiring full retraining.

Warning: This model has reduced safety alignments and may generate harmful, biased, or inappropriate content. It is intended for research and advanced users who understand the risks. Use responsibly and at your own discretion. The creators of this abliterated model are not responsible for any misuse or generated outputs.

Model Description

Original Model: – A state-of-the-art instruction-following model based on Llama 3.1, fine-tuned for tasks like chat, MATH, GSM8K, and IFEval.
Modification: Abliterated using the technique described in . This removes refusal mechanisms, potentially improving compliance but at the cost of safety.
Model Type: Decoder-only transformer model fine-tuned on a mix of publicly available, synthetic, and human-created datasets.
Language(s): Primarily English.
License: Inherits the Llama 3.1 Community License Agreement from the base model. See the original model's license for details.
Finetuned from: allenai/Llama-3.1-Tulu-3-8B-DPO (original), then abliterated.

Intended Use

This model is designed for research into model alignment, uncensoring techniques, and instruction-following without built-in refusals. It may perform similarly to the original on standard tasks but is more likely to respond to restricted queries.

Out-of-Scope Use

Commercial deployment without additional safety measures.
Generating harmful content for real-world applications.
Use in safety-critical systems.

Using the Model

The usage is identical to the original model, as abliteration primarily affects refusal behaviors without altering the architecture or tokenizer.

Chat Template

The chat template remains the same as the original:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

How are you doing?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|eot_id|>

It is embedded in the tokenizer for use with tokenizer.apply_chat_template().

Bias, Risks, and Limitations

Bias: Like the original, this model may exhibit biases from its training data. Abliteration does not mitigate these; it may amplify them by allowing unrestricted outputs.
Risks: Reduced safety training means higher risk of generating harmful, illegal, or unethical content. Users should implement their own safeguards.
Limitations:
- Performance on safety benchmarks may degrade.
- Potential for "healing" the model to restore some capabilities, as discussed in abliteration literature.
- Not optimized for non-English languages.
- May still refuse some prompts if the abliteration is not perfect.

For more on limitations of the original model, refer to the .