h3ir commited on
Commit
df51c26
·
verified ·
1 Parent(s): 8c10a27

Update inference config (v0.1.0 label)

Browse files
Files changed (1) hide show
  1. INFERENCE_CONFIG.md +11 -8
INFERENCE_CONFIG.md CHANGED
@@ -1,10 +1,10 @@
1
- # Inference Endpoint Configuration (v0.1.1)
2
 
3
  ## Model Details
4
- - **Model**: MORBID-Actuarial v0.1.1 Conversational
5
  - **Type**: Causal Language Model (Conversational)
6
  - **Base**: TinyLlama-1.1B
7
- - **Handler**: Custom handler.py (v0.1.1) included
8
 
9
  ## Recommended Configuration
10
  - **Instance Type**: GPU Small (1x NVIDIA T4)
@@ -27,12 +27,15 @@ def query(payload):
27
  response = requests.post(API_URL, headers=headers, json=payload)
28
  return response.json()
29
 
30
- # Test queries
31
  output = query({
32
- "inputs": "Hi, how are you?",
 
33
  "parameters": {
34
- "max_new_tokens": 100,
35
- "temperature": 0.8
 
 
36
  }
37
  })
38
  ```
@@ -41,4 +44,4 @@ output = query({
41
  - Conversational AI with personality
42
  - Actuarial expertise (97.8% accuracy on exams)
43
  - Multi-turn context retention and reduced artifacts
44
- - Tighter generation with stop handling and bad-words filtering
 
1
+ # Inference Endpoint Configuration (v0.1.0)
2
 
3
  ## Model Details
4
+ - **Model**: MORBID-Actuarial v0.1.0 Conversational
5
  - **Type**: Causal Language Model (Conversational)
6
  - **Base**: TinyLlama-1.1B
7
+ - **Handler**: Custom handler.py included
8
 
9
  ## Recommended Configuration
10
  - **Instance Type**: GPU Small (1x NVIDIA T4)
 
27
  response = requests.post(API_URL, headers=headers, json=payload)
28
  return response.json()
29
 
30
+ # Test queries (tighter generation under the hood)
31
  output = query({
32
+ "inputs": "Human: Help me price a level annuity immediate paying 1000/year at 5%
33
+ Assistant:",
34
  "parameters": {
35
+ "max_new_tokens": 220,
36
+ "temperature": 0.35,
37
+ "top_p": 0.9,
38
+ "repetition_penalty": 1.15
39
  }
40
  })
41
  ```
 
44
  - Conversational AI with personality
45
  - Actuarial expertise (97.8% accuracy on exams)
46
  - Multi-turn context retention and reduced artifacts
47
+ - Tighter generation and stop handling