stojchet commited on
Commit
f6b222d
·
verified ·
1 Parent(s): cf91b1a

End of training

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 2.9955
22
- - Rewards/chosen: -57.0781
23
- - Rewards/rejected: -325.9450
24
- - Rewards/accuracies: 0.9119
25
- - Rewards/margins: 268.8669
26
- - Logps/rejected: -3333.4583
27
- - Logps/chosen: -608.1487
28
- - Logits/rejected: -6.5634
29
- - Logits/chosen: -9.0439
30
 
31
  ## Model description
32
 
@@ -61,7 +61,7 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
- | 13.2368 | 2.3088 | 100 | 2.9955 | -57.0781 | -325.9450 | 0.9119 | 268.8669 | -3333.4583 | -608.1487 | -6.5634 | -9.0439 |
65
 
66
 
67
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 18.2999
22
+ - Rewards/chosen: -55.1127
23
+ - Rewards/rejected: -55.1897
24
+ - Rewards/accuracies: 0.4073
25
+ - Rewards/margins: 0.0770
26
+ - Logps/rejected: -625.9051
27
+ - Logps/chosen: -588.4946
28
+ - Logits/rejected: -8.9525
29
+ - Logits/chosen: -8.9519
30
 
31
  ## Model description
32
 
 
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 22.9478 | 2.3088 | 100 | 18.2999 | -55.1127 | -55.1897 | 0.4073 | 0.0770 | -625.9051 | -588.4946 | -8.9525 | -8.9519 |
65
 
66
 
67
  ### Framework versions