# Inference Endpoint Configuration (v0.1.0)

## Model Details
- **Model**: MORBID-Actuarial v0.1.0 Conversational
- **Type**: Causal Language Model (Conversational)
- **Base**: TinyLlama-1.1B
- **Handler**: Custom handler.py included

## Recommended Configuration
- **Instance Type**: GPU Small (1x NVIDIA T4)
- **Framework**: PyTorch
- **Task**: Text Generation
- **Replicas**: 1 (can scale based on usage)

## Environment Variables
No additional environment variables required.

## Example Usage

```python
import requests

API_URL = "https://your-endpoint.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Test queries (tighter generation under the hood)
output = query({
    "inputs": "Human: Help me price a level annuity immediate paying 1000/year at 5%
Assistant:",
    "parameters": {
        "max_new_tokens": 220,
        "temperature": 0.35,
        "top_p": 0.9,
        "repetition_penalty": 1.15
    }
})
```

## Features
- Conversational AI with personality
- Actuarial expertise (97.8% accuracy on exams)
- Multi-turn context retention and reduced artifacts
- Tighter generation and stop handling