nvidia
/

Llama-3.1-70B-Instruct-FP8

Text Generation

text-generation-inference

Model card Files Files and versions

Update README.md

#1

by omrialmog - opened Feb 26

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +25 -18

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
 base_model:
 - meta-llama/Llama-3.1-70B-Instruct
 ---
 # Model Overview
@@ -77,39 +80,37 @@ python examples/llama/convert_checkpoint.py --model_dir Llama-3.1-70B-Instruct-F
 trtllm-build --checkpoint_dir /ckpt --output_dir /engine
 ```
-* Accuracy evaluation:
-1) Prepare the MMLU dataset:
-```sh
-mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
-tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
-```
-2) Measure MMLU:
-```sh
-python examples/mmlu.py --engine_dir ./engine --tokenizer_dir Llama-3.1-70B-Instruct-FP8/ --test_trt_llm --data_dir data/mmlu
-```
 * Throughputs evaluation:
 Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
 #### Evaluation
-The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
 <table>
   <tr>
    <td><strong>Precision</strong>
    </td>
    <td><strong>MMLU</strong>
    </td>
    <td><strong>TPS</strong>
    </td>
   </tr>
   <tr>
-   <td>FP16
    </td>
-   <td>82.5
    </td>
    <td>1356.92
    </td>
@@ -117,7 +118,13 @@ The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark r
   <tr>
    <td>FP8
    </td>
-   <td>82.3
    </td>
    <td>2040.30
    </td>

 ---
 base_model:
 - meta-llama/Llama-3.1-70B-Instruct
+license: llama3.1
+pipeline_tag: text-generation
+library_name: transformers
 ---
 # Model Overview
 trtllm-build --checkpoint_dir /ckpt --output_dir /engine
 ```
 * Throughputs evaluation:
 Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
 #### Evaluation
 <table>
   <tr>
    <td><strong>Precision</strong>
    </td>
    <td><strong>MMLU</strong>
    </td>
+   <td><strong>GSM8K (CoT) </strong>
+   </td>
+   <td><strong>ARC Challenge</strong>
+   </td>
+   <td><strong>IFEVAL</strong>
+   </td>
    <td><strong>TPS</strong>
    </td>
   </tr>
   <tr>
+   <td>BF16
+   </td>
+   <td>83.3
+   </td>
+   <td>95.3
    </td>
+   <td>93.7
+   </td>
+   <td>92.1
    </td>
    <td>1356.92
    </td>
   <tr>
    <td>FP8
    </td>
+   <td>83.2
+   </td>
+   <td>94.3
+   </td>
+   <td>93.2
+   </td>
+   <td>92.2
    </td>
    <td>2040.30
    </td>