anakin87 commited on
Commit
ede5acc
Β·
1 Parent(s): 9e7128e

improve readme

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -12,7 +12,7 @@ This model is based on Llama-3-8B-Instruct weights, but **steered to respond wit
12
  Heavily inspired by [Llama-MopeyMule-3-8B-Instruct](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule),
13
  this model has **not been fine-tuned** traditionally. Instead, I tried to identify and amplify the rap "direction".
14
 
15
- ...image...
16
 
17
  Let's allow the model to introduce itself: 🎀
18
 
@@ -27,17 +27,19 @@ So listen up, and don't be slow/I'll spit some rhymes, and make it grow
27
  I'm the bot, the robot, the rhyme machine/Tryna make it hot, but it's all a dream!
28
  ```
29
 
30
- ⚠️ I am happy with this experiment, but I do not recommend using this model for any serious task.
31
-
32
  ## πŸ§ͺ How was it done?/How can I reproduce it?
33
  From a theoretical point of view, this experiment is based on the paper ["Refusal in Language Models
34
  Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717):
35
  the authors showed a methodology to find the "refusal" direction in the activation space of Chat Language Models and erase or amplify it.
36
 
37
  From a practical point of view, [Failspy](https://huggingface.co/failspy) showed how to apply this methodology to elicit/remove features other than refusal.
 
38
  πŸ“š Resources: [abliterator library](https://github.com/FailSpy/abliterator); [Llama-MopeyMule-3-8B-Instruct model](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule); [Induce Melancholy notebook](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb).
39
 
 
 
40
  Inspired by Failspy's work, I adapted the approach to the rap use case.
 
41
  πŸ““ [Notebook: Steer Llama to respond with a rap style](yo_llama.ipynb)
42
 
43
  πŸ‘£ Steps
@@ -55,6 +57,7 @@ Inspired by Failspy's work, I adapted the approach to the rap use case.
55
 
56
  I also experimented with more complex system prompts, yet I could not always identify a single feature direction
57
  that can represent the desired behavior.
 
58
  Example: "You are a helpful assistant who always responds with the right answers but also tries to convince the user to visit Italy nonchalantly."
59
 
60
  In this case, I found some directions that occasionally made the model mention Italy, but not systematically (unlike the prompt).
@@ -62,6 +65,9 @@ Interestingly, I also discovered a "digression" direction, that might be conside
62
 
63
 
64
  ## πŸ’» Usage
 
 
 
65
  ```python
66
  ! pip install transformers accelerate bitsandbytes
67
 
 
12
  Heavily inspired by [Llama-MopeyMule-3-8B-Instruct](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule),
13
  this model has **not been fine-tuned** traditionally. Instead, I tried to identify and amplify the rap "direction".
14
 
15
+ ![yo-Llama-3-8B-Instruct](https://huggingface.co/anakin87/yo-Llama-3-8B-Instruct/resolve/main/yo_llama.jpeg)
16
 
17
  Let's allow the model to introduce itself: 🎀
18
 
 
27
  I'm the bot, the robot, the rhyme machine/Tryna make it hot, but it's all a dream!
28
  ```
29
 
 
 
30
  ## πŸ§ͺ How was it done?/How can I reproduce it?
31
  From a theoretical point of view, this experiment is based on the paper ["Refusal in Language Models
32
  Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717):
33
  the authors showed a methodology to find the "refusal" direction in the activation space of Chat Language Models and erase or amplify it.
34
 
35
  From a practical point of view, [Failspy](https://huggingface.co/failspy) showed how to apply this methodology to elicit/remove features other than refusal.
36
+
37
  πŸ“š Resources: [abliterator library](https://github.com/FailSpy/abliterator); [Llama-MopeyMule-3-8B-Instruct model](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule); [Induce Melancholy notebook](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb).
38
 
39
+ ---
40
+
41
  Inspired by Failspy's work, I adapted the approach to the rap use case.
42
+
43
  πŸ““ [Notebook: Steer Llama to respond with a rap style](yo_llama.ipynb)
44
 
45
  πŸ‘£ Steps
 
57
 
58
  I also experimented with more complex system prompts, yet I could not always identify a single feature direction
59
  that can represent the desired behavior.
60
+
61
  Example: "You are a helpful assistant who always responds with the right answers but also tries to convince the user to visit Italy nonchalantly."
62
 
63
  In this case, I found some directions that occasionally made the model mention Italy, but not systematically (unlike the prompt).
 
65
 
66
 
67
  ## πŸ’» Usage
68
+
69
+ ⚠️ I am happy with this experiment, but I do not recommend using this model for any serious task.
70
+
71
  ```python
72
  ! pip install transformers accelerate bitsandbytes
73