Text Generation
Transformers
Safetensors
PyTorch
English
llama
nvidia
llama-3
conversational
text-generation-inference
suhara commited on
Commit
c40edea
·
verified ·
1 Parent(s): 4b30a3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -17,9 +17,10 @@ tags:
17
 
18
  # Llama-3.1-Nemotron-Nano-4B-v1.1
19
 
20
-
21
  ## Model Overview
22
 
 
 
23
  Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of [nvidia/Llama-3.1-Minitron-4B-Width-Base](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which is created from Llama 3.1 8B using [our LLM compression technique](https://arxiv.org/abs/2408.11796) and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
24
 
25
  Llama-3.1-Nemotron-Nano-4B-v1.1 is a model which offers a great tradeoff between model accuracy and efficiency. The model fits on a single RTX GPU and can be used locally. The model supports a context length of 128K.
 
17
 
18
  # Llama-3.1-Nemotron-Nano-4B-v1.1
19
 
 
20
  ## Model Overview
21
 
22
+ ![Accuracy Comparison Plot](./accuracy_plot.png)
23
+
24
  Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of [nvidia/Llama-3.1-Minitron-4B-Width-Base](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which is created from Llama 3.1 8B using [our LLM compression technique](https://arxiv.org/abs/2408.11796) and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
25
 
26
  Llama-3.1-Nemotron-Nano-4B-v1.1 is a model which offers a great tradeoff between model accuracy and efficiency. The model fits on a single RTX GPU and can be used locally. The model supports a context length of 128K.