# Inference Endpoint Configuration (v0.1.0) ## Model Details - **Model**: MORBID-Actuarial v0.1.0 Conversational - **Type**: Causal Language Model (Conversational) - **Base**: TinyLlama-1.1B - **Handler**: Custom handler.py included ## Recommended Configuration - **Instance Type**: GPU Small (1x NVIDIA T4) - **Framework**: PyTorch - **Task**: Text Generation - **Replicas**: 1 (can scale based on usage) ## Environment Variables No additional environment variables required. ## Example Usage ```python import requests API_URL = "https://your-endpoint.endpoints.huggingface.cloud" headers = {"Authorization": "Bearer YOUR_TOKEN"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() # Test queries (tighter generation under the hood) output = query({ "inputs": "Human: Help me price a level annuity immediate paying 1000/year at 5% Assistant:", "parameters": { "max_new_tokens": 220, "temperature": 0.35, "top_p": 0.9, "repetition_penalty": 1.15 } }) ``` ## Features - Conversational AI with personality - Actuarial expertise (97.8% accuracy on exams) - Multi-turn context retention and reduced artifacts - Tighter generation and stop handling