jukofyork
/

DeepSeek-V3-DRAFT-0.6B-v3.0

speculative-decoding

Model card Files Files and versions

jukofyork commited on Aug 10

Commit

b9878ad

·

verified ·

1 Parent(s): 00805f0

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -234,4 +234,6 @@ drop_tails = true
 ```
-I used six `RTX A6000` GPUs over three nodes and hence the `60` batch size (`6 x 10 gradient accumulation steps = 60`).

 ```
+I used six `RTX A6000` GPUs over three nodes and hence the `60` batch size (`6 x 10 gradient accumulation steps = 60`):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/GHyDC4c8zR34i_VfCjYKn.png)