nvidia
/

Llama-3.1-Nemotron-Nano-4B-v1.1

Text Generation

text-generation-inference

Model card Files Files and versions

suhara commited on May 20

Commit

c40edea

·

verified ·

1 Parent(s): 4b30a3f

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -17,9 +17,10 @@ tags:
 # Llama-3.1-Nemotron-Nano-4B-v1.1
 ## Model Overview
 Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of [nvidia/Llama-3.1-Minitron-4B-Width-Base](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which is created from Llama 3.1 8B using [our LLM compression technique](https://arxiv.org/abs/2408.11796) and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
 Llama-3.1-Nemotron-Nano-4B-v1.1 is a model which offers a great tradeoff between model accuracy and efficiency. The model fits on a single RTX GPU and can be used locally. The model supports a context length of 128K.

 # Llama-3.1-Nemotron-Nano-4B-v1.1
 ## Model Overview
+![Accuracy Comparison Plot](./accuracy_plot.png)
 Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of [nvidia/Llama-3.1-Minitron-4B-Width-Base](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which is created from Llama 3.1 8B using [our LLM compression technique](https://arxiv.org/abs/2408.11796) and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
 Llama-3.1-Nemotron-Nano-4B-v1.1 is a model which offers a great tradeoff between model accuracy and efficiency. The model fits on a single RTX GPU and can be used locally. The model supports a context length of 128K.