# Llama3-ThinkQ8

A fine-tuned version of Llama 3 that shows explicit thinking using `<think>` and `<answer>` tags. This model is quantized to 8-bit (Q8) for efficient inference.

## Model Details
- **Base Model**: Llama 3
- **Quantization**: 8-bit (Q8)
- **Special Feature**: Explicit thinking process with tags

## How to Use with Ollama

### 1. Install Ollama
If you haven't already installed Ollama, follow the instructions at [ollama.ai](https://ollama.ai).

### 2. Download the model file
Download the GGUF file from this repository.

### 3. Create the Ollama model
Create a file named `Modelfile` with this content:

```
FROM llama3-thinkQ8.gguf
# Model parameters
PARAMETER temperature 0.8
PARAMETER top_p 0.9
# System prompt
SYSTEM """You are a helpful assistant. You will check the user request and you will think and generate brainstorming and self-thoughts in your mind and respond only in the following format:
<think> {your thoughts here} </think>
<answer> {your final answer here} </answer>. Use the tags once and place all your output inside them ONLY"""
```

Then run:
```bash
ollama create llama3-think -f Modelfile
```

### 4. Run the model
```bash
ollama run llama3-think
```

## Example Prompts

Try these examples:

```
Using each number in this tensor ONLY once (5, 8, 3) and any arithmetic operation like add, subtract, multiply, divide, create an equation that equals 19.
```

```
Explain the concept of quantum entanglement to a high school student.
```