Dhee-NxtGen-Qwen3-Marathi-v2

Model Description

Dhee-NxtGen-Qwen3-Marathi-v2 is a large language model developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.
It is based on the Qwen3 architecture and fine-tuned for assistant-style, function-calling, and reasoning-based tasks in Marathi.

This model is capable of producing natural, fluent, and contextually accurate Marathi text — making it ideal for conversational AI, reasoning systems, and domain-specific dialogue agents.

Key Features

  • Fluent and context-aware Marathi text generation
  • Optimized for assistant-style and reasoning conversations
  • Handles question answering, summarization, and creative writing
  • Fully compatible with 🤗 Hugging Face Transformers
  • Supports VLLM for high-performance batched inference

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dhee-nxtgen-qwen3-marathi-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Example prompt
prompt = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
तुम्ही माझ्यासाठी अपॉइंटमेंट शेड्यूल करू शकता का?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

  • Marathi conversational chatbots and assistants
  • Function-calling and structured response generation
  • Story generation and summarization in Marathi
  • Natural dialogue systems for Indic AI applications

Limitations

  • May generate inaccurate or biased responses in rare cases
  • Performance can vary on out-of-domain or code-mixed inputs
  • Primarily optimized for Marathi; other languages may produce less fluent results

VLLM / High-Performance Serving Requirements

For high-throughput serving with vLLM, ensure the following environment:

  • GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100)
  • PyTorch 2.1+ and CUDA toolkit installed
  • For V100 GPUs (sm70), vLLM GPU inference is not supported; CPU fallback is possible but slower.

Install dependencies:

pip install torch transformers vllm sentencepiece

Run vLLM server:

vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-marathi-v2   --host 0.0.0.0   --port 8000

License

Released under the Apache 2.0 License.


Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.

Downloads last month
12
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dheeyantra/dhee-nxtgen-qwen3-marathi-v2