avemio
/

German-RAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI

Question Answering

Question-Answering

Model card Files Files and versions

avemio-digital commited on Dec 4, 2024

Commit

da56574

·

verified ·

1 Parent(s): 59785ac

Update README.md

Files changed (1) hide show

README.md +14 -14

README.md CHANGED Viewed

@@ -116,20 +116,20 @@ For training data details, please see the [GRAG-SFT-Dataset](https://huggingface
 ### Architecture
-|                        | **GRAG-PHI-SFT**   |
-|------------------------|-------------------|---------------------|--------------------|--------------------|------------------|
-| d_model     | 3072              |
-| num heads              | 32                |
-| num layers             | 32                |
-| MLP ratio              | 2.66        |
-| LayerNorm type         | RMSNorm |
-| pos embeddings         | RoPE              |
-| attention variant      | Standard Multi-Head Self Attention with sliding-window of 2047             |
-| biases                 | none              |
-| block type             | sequential        |
-| activation             | SiLU            |
-| sequence length        | 131072             |       |
-| weight tying           | bfloat16               |
 ### Hyperparameters

 ### Architecture
+| Parameter            | GRAG-PHI-SFT                                                                                   |
+|-----------------------|-----------------------------------------------------------------------------------------------|
+| **d_model**          | 3072                                                                                          |
+| **num heads**        | 32                                                                                            |
+| **num layers**       | 32                                                                                            |
+| **MLP ratio**        | 2.66                                                                                          |
+| **LayerNorm type**   | RMSNorm                                                                                       |
+| **pos embeddings**   | RoPE                                                                                          |
+| **attention variant**| Standard Multi-Head Self Attention with sliding-window of 2047                                |
+| **biases**           | none                                                                                          |
+| **block type**       | sequential                                                                                    |
+| **activation**       | SiLU                                                                                          |
+| **sequence length**  | 131072                                                                                        |
+| **weight tying**     | bfloat16
 ### Hyperparameters