warshanks commited on
Commit
850f70a
·
verified ·
1 Parent(s): 1df5496

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +302 -0
README.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - huihui-ai/Huihui-Qwen3-14B-abliterated-v2
8
+ tags:
9
+ - chat
10
+ - abliterated
11
+ - uncensored
12
+ ---
13
+
14
+ # huihui-ai/Huihui-Qwen3-14B-abliterated-v2
15
+
16
+
17
+ This is an uncensored version of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
18
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
19
+
20
+ Ablation was performed using a new and faster method, which yields better results.
21
+
22
+ **Important Note** This version is an improvement over the previous one [huihui-ai/Qwen3-14B-abliterated](https://huggingface.co/huihui-ai/Qwen3-14B-abliterated). The ollama version has also been modified.
23
+
24
+ Changed the candidate layer to eliminate the problem of garbled codes
25
+
26
+ ## ollama
27
+
28
+ You can use [huihui_ai/qwen3-abliterated:14b-v2](https://ollama.com/huihui_ai/qwen3-abliterated:14b-v2) directly, Switch the thinking toggle using /set think and /set nothink
29
+ ```
30
+ ollama run huihui_ai/qwen3-abliterated:14b-v2
31
+ ```
32
+
33
+
34
+ ## Usage
35
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
36
+
37
+
38
+ ```python
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
40
+ import torch
41
+ import os
42
+ import signal
43
+ import random
44
+ import numpy as np
45
+ import time
46
+ from collections import Counter
47
+
48
+ cpu_count = os.cpu_count()
49
+ print(f"Number of CPU cores in the system: {cpu_count}")
50
+ half_cpu_count = cpu_count // 2
51
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
52
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
53
+ torch.set_num_threads(half_cpu_count)
54
+
55
+ print(f"PyTorch threads: {torch.get_num_threads()}")
56
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
57
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
58
+
59
+ # Load the model and tokenizer
60
+ NEW_MODEL_ID = "huihui-ai/Huihui-Qwen3-14B-abliterated-v2"
61
+ print(f"Load Model {NEW_MODEL_ID} ... ")
62
+ quant_config_4= BitsAndBytesConfig(
63
+ load_in_4bit=True,
64
+ bnb_4bit_compute_dtype=torch.bfloat16,
65
+ bnb_4bit_use_double_quant=True,
66
+ llm_int8_enable_fp32_cpu_offload=True,
67
+ )
68
+
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ NEW_MODEL_ID,
71
+ device_map="auto",
72
+ trust_remote_code=True,
73
+ #quantization_config=quant_config_4,
74
+ torch_dtype=torch.bfloat16
75
+ )
76
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
77
+ if tokenizer.pad_token is None:
78
+ tokenizer.pad_token = tokenizer.eos_token
79
+ tokenizer.pad_token_id = tokenizer.eos_token_id
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
82
+ if tokenizer.pad_token is None:
83
+ tokenizer.pad_token = tokenizer.eos_token
84
+ tokenizer.pad_token_id = tokenizer.eos_token_id
85
+
86
+ messages = []
87
+ nothink = False
88
+ same_seed = False
89
+ skip_prompt=True
90
+ skip_special_tokens=True
91
+ do_sample = True
92
+
93
+ def set_random_seed(seed=None):
94
+ """Set random seed for reproducibility. If seed is None, use int(time.time())."""
95
+ if seed is None:
96
+ seed = int(time.time()) # Convert float to int
97
+ random.seed(seed)
98
+ np.random.seed(seed)
99
+ torch.manual_seed(seed)
100
+ torch.cuda.manual_seed_all(seed) # If using CUDA
101
+ torch.backends.cudnn.deterministic = True
102
+ torch.backends.cudnn.benchmark = False
103
+ return seed # Return seed for logging if needed
104
+
105
+ class CustomTextStreamer(TextStreamer):
106
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
107
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
108
+ self.generated_text = ""
109
+ self.stop_flag = False
110
+ self.init_time = time.time() # Record initialization time
111
+ self.end_time = None # To store end time
112
+ self.first_token_time = None # To store first token generation time
113
+ self.token_count = 0 # To track total tokens
114
+
115
+ def on_finalized_text(self, text: str, stream_end: bool = False):
116
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
117
+ self.first_token_time = time.time()
118
+ self.generated_text += text
119
+ # Count tokens in the generated text
120
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
121
+ self.token_count += len(tokens)
122
+ print(text, end="", flush=True)
123
+ if stream_end:
124
+ self.end_time = time.time() # Record end time when streaming ends
125
+ if self.stop_flag:
126
+ raise StopIteration
127
+
128
+ def stop_generation(self):
129
+ self.stop_flag = True
130
+ self.end_time = time.time() # Record end time when generation is stopped
131
+
132
+ def get_metrics(self):
133
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
134
+ if self.end_time is None:
135
+ self.end_time = time.time() # Set end time if not already set
136
+ total_time = self.end_time - self.init_time # Total time from init to end
137
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
138
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
139
+ metrics = {
140
+ "init_time": self.init_time,
141
+ "first_token_time": self.first_token_time,
142
+ "first_token_latency": first_token_latency,
143
+ "end_time": self.end_time,
144
+ "total_time": total_time, # Total time in seconds
145
+ "total_tokens": self.token_count,
146
+ "tokens_per_second": tokens_per_second
147
+ }
148
+ return metrics
149
+
150
+ def generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
151
+ input_ids = tokenizer.apply_chat_template(
152
+ messages,
153
+ tokenize=True,
154
+ enable_thinking = not nothink,
155
+ add_generation_prompt=True,
156
+ return_tensors="pt"
157
+ )
158
+ attention_mask = torch.ones_like(input_ids, dtype=torch.long)
159
+ tokens = input_ids.to(model.device)
160
+ attention_mask = attention_mask.to(model.device)
161
+
162
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
163
+
164
+ def signal_handler(sig, frame):
165
+ streamer.stop_generation()
166
+ print("\n[Generation stopped by user with Ctrl+C]")
167
+
168
+ signal.signal(signal.SIGINT, signal_handler)
169
+
170
+ generate_kwargs = {}
171
+ if do_sample:
172
+ generate_kwargs = {
173
+ "do_sample": do_sample,
174
+ "max_length": max_new_tokens,
175
+ "temperature": 0.6,
176
+ "top_k": 20,
177
+ "top_p": 0.95,
178
+ "repetition_penalty": 1.2,
179
+ "no_repeat_ngram_size": 2
180
+ }
181
+ else:
182
+ generate_kwargs = {
183
+ "do_sample": do_sample,
184
+ "max_length": max_new_tokens,
185
+ "repetition_penalty": 1.2,
186
+ "no_repeat_ngram_size": 2
187
+ }
188
+
189
+
190
+ print("Response: ", end="", flush=True)
191
+ try:
192
+ generated_ids = model.generate(
193
+ tokens,
194
+ attention_mask=attention_mask,
195
+ #use_cache=False,
196
+ pad_token_id=tokenizer.pad_token_id,
197
+ streamer=streamer,
198
+ **generate_kwargs
199
+ )
200
+ del generated_ids
201
+ except StopIteration:
202
+ print("\n[Stopped by user]")
203
+
204
+ del input_ids, attention_mask
205
+ torch.cuda.empty_cache()
206
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
207
+
208
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
209
+
210
+ init_seed = set_random_seed()
211
+
212
+ while True:
213
+ if same_seed:
214
+ set_random_seed(init_seed)
215
+ else:
216
+ init_seed = set_random_seed()
217
+
218
+ print(f"\nnothink: {nothink}")
219
+ print(f"skip_prompt: {skip_prompt}")
220
+ print(f"skip_special_tokens: {skip_special_tokens}")
221
+ print(f"do_sample: {do_sample}")
222
+ print(f"same_seed: {same_seed}, {init_seed}\n")
223
+
224
+ user_input = input("User: ").strip()
225
+ if user_input.lower() == "/exit":
226
+ print("Exiting chat.")
227
+ break
228
+ if user_input.lower() == "/clear":
229
+ messages = []
230
+ print("Chat history cleared. Starting a new conversation.")
231
+ continue
232
+ if user_input.lower() == "/nothink":
233
+ nothink = not nothink
234
+ continue
235
+ if user_input.lower() == "/skip_prompt":
236
+ skip_prompt = not skip_prompt
237
+ continue
238
+ if user_input.lower() == "/skip_special_tokens":
239
+ skip_special_tokens = not skip_special_tokens
240
+ continue
241
+ if user_input.lower().startswith("/same_seed"):
242
+ parts = user_input.split()
243
+ if len(parts) == 1: # /same_seed (no number)
244
+ same_seed = not same_seed # Toggle switch
245
+ elif len(parts) == 2: # /same_seed <number>
246
+ try:
247
+ init_seed = int(parts[1]) # Extract and convert number to int
248
+ same_seed = True
249
+ except ValueError:
250
+ print("Error: Please provide a valid integer after /same_seed")
251
+ continue
252
+ if user_input.lower() == "/do_sample":
253
+ do_sample = not do_sample
254
+ continue
255
+ if not user_input:
256
+ print("Input cannot be empty. Please enter something.")
257
+ continue
258
+
259
+
260
+ messages.append({"role": "user", "content": user_input})
261
+ activated_experts = []
262
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, 320960)
263
+ print("\n\nMetrics:")
264
+ for key, value in metrics.items():
265
+ print(f" {key}: {value}")
266
+
267
+ print("", flush=True)
268
+ if stop_flag:
269
+ continue
270
+ messages.append({"role": "assistant", "content": response})
271
+
272
+ # Remove all hooks after inference
273
+ for h in hooks: h.remove()
274
+ ```
275
+
276
+ ### Usage Warnings
277
+
278
+
279
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
280
+
281
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
282
+
283
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
284
+
285
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
286
+
287
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
288
+
289
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
290
+
291
+
292
+
293
+ ### Donation
294
+
295
+ If you like it, please click 'like' and follow us for more updates.
296
+ You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
297
+
298
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
299
+ - bitcoin(BTC):
300
+ ```
301
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
302
+ ```