mlc-ai/Mistral-7B-Instruct-v0.2-q3f16_1-MLC

by spydaz - opened Feb 6, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-4395

This PR is in draft mode

Files changed (3) hide show

README.md +0 -57
mlc-chat-config.json +2 -35
tensor-cache.json +0 -0

README.md DELETED Viewed

@@ -1,57 +0,0 @@
----
-library_name: mlc-llm
-base_model: mistralai/Mistral-7B-Instruct-v0.2
-tags:
-- mlc-llm
-- web-llm
----
-# Mistral-7B-Instruct-v0.2-q3f16_1-MLC
-This is the [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model in MLC format `q3f16_1`.
-The model can be used for projects [MLC-LLM](https://github.com/mlc-ai/mlc-llm) and [WebLLM](https://github.com/mlc-ai/web-llm).
-## Example Usage
-Here are some examples of using this model in MLC LLM.
-Before running the examples, please install MLC LLM by following the [installation documentation](https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages).
-### Chat
-In command line, run
-```bash
-mlc_llm chat HF://mlc-ai/Mistral-7B-Instruct-v0.2-q3f16_1-MLC
-```
-### REST Server
-In command line, run
-```bash
-mlc_llm serve HF://mlc-ai/Mistral-7B-Instruct-v0.2-q3f16_1-MLC
-```
-### Python API
-```python
-from mlc_llm import MLCEngine
-# Create engine
-model = "HF://mlc-ai/Mistral-7B-Instruct-v0.2-q3f16_1-MLC"
-engine = MLCEngine(model)
-# Run chat completion in OpenAI API.
-for response in engine.chat.completions.create(
-    messages=[{"role": "user", "content": "What is the meaning of life?"}],
-    model=model,
-    stream=True,
-):
-    for choice in response.choices:
-        print(choice.delta.content, end="", flush=True)
-print("\n")
-engine.terminate()
-```
-## Documentation
-For more information on MLC LLM project, please visit our [documentation](https://llm.mlc.ai/docs/) and [GitHub repo](http://github.com/mlc-ai/mlc-llm).

mlc-chat-config.json CHANGED Viewed

@@ -15,8 +15,7 @@
     "sliding_window_size": 1024,
     "prefill_chunk_size": 128,
     "attention_sink_size": 4,
-    "tensor_parallel_shards": 1,
-    "max_batch_size": 80
   },
   "vocab_size": 32000,
   "context_window_size": -1,
@@ -30,39 +29,7 @@
   "temperature": 0.7,
   "repetition_penalty": 1.0,
   "top_p": 0.95,
-  "conv_template": {
-    "name": "mistral_default",
-    "system_template": "[INST] {system_message}",
-    "system_message": "Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.",
-    "system_prefix_token_ids": [
-      1
-    ],
-    "add_role_after_system_message": false,
-    "roles": {
-      "user": "[INST]",
-      "assistant": "[/INST]",
-      "tool": "[INST]"
-    },
-    "role_templates": {
-      "user": "{user_message}",
-      "assistant": "{assistant_message}",
-      "tool": "{tool_message}"
-    },
-    "messages": [],
-    "seps": [
-      " "
-    ],
-    "role_content_sep": " ",
-    "role_empty_sep": "",
-    "stop_str": [
-      "</s>"
-    ],
-    "stop_token_ids": [
-      2
-    ],
-    "function_string": "",
-    "use_function_calling": false
-  },
   "pad_token_id": 0,
   "bos_token_id": 1,
   "eos_token_id": 2,

     "sliding_window_size": 1024,
     "prefill_chunk_size": 128,
     "attention_sink_size": 4,
+    "tensor_parallel_shards": 1
   },
   "vocab_size": 32000,
   "context_window_size": -1,
   "temperature": 0.7,
   "repetition_penalty": 1.0,
   "top_p": 0.95,
+  "conv_template": "mistral_default",
   "pad_token_id": 0,
   "bos_token_id": 1,
   "eos_token_id": 2,

tensor-cache.json DELETED Viewed

The diff for this file is too large to render. See raw diff