Dhee-NxtGen-Qwen3-Kannada-v2

Model Description

Dhee-NxtGen-Qwen3-Kannada-v2 is a large language model designed for advanced Kannada language understanding and generation.
It is based on the Qwen3 architecture and fine-tuned for assistant-style, function-calling, and reasoning-based conversational tasks.

Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd., this model is ideal for building intelligent Kannada chatbots, reasoning systems, and task-based dialogue agents.

Key Features

  • Fluent, context-aware Kannada text generation
  • Optimized for assistant-style and reasoning conversations
  • Handles open-ended generation, summarization, and Q&A
  • Fully compatible with 🤗 Hugging Face Transformers
  • Supports VLLM for high-performance inference

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dhee-nxtgen-qwen3-kannada-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Example prompt
prompt = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
ನೀವು ನನಗಾಗಿ ಒಂದು ಅಪಾಯಿಂಟ್ಮೆಂಟ್ ನಿಗದಿಪಡಿಸಬಹುದೇ?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

  • Kannada conversational chatbots and assistants
  • Function-calling and structured response generation
  • Story generation and summarization in Kannada
  • Natural dialogue systems for Indic AI applications

Limitations

  • May generate inaccurate or biased responses in rare cases
  • Performance can vary on out-of-domain or code-mixed inputs
  • Primarily optimized for Kannada; other languages may produce less fluent results

VLLM / High-Performance Serving Requirements

For high-throughput serving with vLLM, ensure the following environment:

  • GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100)
  • PyTorch 2.1+ and CUDA toolkit installed
  • For V100 GPUs (sm70), vLLM GPU inference is not supported; CPU fallback is possible but slower.

Install dependencies:

pip install torch transformers vllm sentencepiece

Run vLLM server:

vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-kannada-v2   --host 0.0.0.0   --port 8000

License

Released under the Apache 2.0 License.


Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.

Downloads last month
8
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dheeyantra/dhee-nxtgen-qwen3-kannada-v2