5456es commited on
Commit
67ec20d
·
verified ·
1 Parent(s): a7a1b8b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen2.5-0.5B-Instruct
4
+ tags:
5
+ - dpo
6
+ - preference-learning
7
+ - implicit
8
+ - pruned
9
+ ---
10
+
11
+ # implicit_reward_Qwen2.5-0.5B-Instruct_prune_0.5-sigmoid
12
+
13
+ This model is a DPO (Direct Preference Optimization) fine-tuned version of Qwen2.5-0.5B-Instruct using the implicit method.
14
+
15
+ ## Model Details
16
+
17
+ - **Base Model**: Qwen2.5-0.5B-Instruct
18
+ - **Training Method**: implicit
19
+ - **Pruning Ratio**: unknown
20
+ - **Training Date**: 2025-09-07
21
+
22
+ ## Training Configuration
23
+
24
+ This model was trained using Direct Preference Optimization (DPO) with the following characteristics:
25
+ - Method: implicit
26
+ - Pruning applied during training
27
+ - Fine-tuned on preference data
28
+
29
+ ## Usage
30
+
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+
34
+ model_name = "5456es/implicit_reward_Qwen2.5-0.5B-Instruct_prune_0.5-sigmoid"
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
36
+ model = AutoModelForCausalLM.from_pretrained(model_name)
37
+
38
+ # Example usage
39
+ prompt = "Your prompt here"
40
+ inputs = tokenizer(prompt, return_tensors="pt")
41
+ outputs = model.generate(**inputs, max_length=100)
42
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
43
+ ```
44
+
45
+ ## Training Data
46
+
47
+ This model was trained on preference data using the DPO algorithm.
48
+
49
+ ## Limitations
50
+
51
+ This model inherits the limitations of its base model and may have additional limitations due to the pruning process.
52
+
53
+ ## Citation
54
+
55
+ If you use this model, please cite the original DPO paper and the base model.