Neodac-mini: Northeast India Cultural AI Model
Neodac-mini (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.
π― Model Overview
- Base Model: google/gemma-3-1b-it
- Specialization: Northeast India Cultural Knowledge
- Training Data: 6,205 culturally authentic Q&A pairs
- Coverage: All 8 Northeast Indian states
- Languages: English (with cultural context)
π Key Features
Cultural Domains Covered
- Festivals & Celebrations: Bihu, Hornbill, Losar, Chapchar Kut, etc.
- Traditional Arts: Dance forms, music, crafts, weaving
- Cuisine: Regional foods, cooking methods, traditional recipes
- Tribal Heritage: Community practices, languages, customs
- Geography: Cultural significance of places and landmarks
- Literature: Folk tales, oral traditions, regional literature
Model Capabilities
- β Accurate cultural information without hallucinations
- β Detailed responses about regional traditions
- β Authentic representation of tribal communities
- β Contextual understanding of cultural nuances
- β Preservation of cultural knowledge through AI
π Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
model = AutoModelForCausalLM.from_pretrained(
"MWirelabs/neodac-mini",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example usage
def ask_neodac-mini(question):
prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=300,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<start_of_turn>model\n")[-1].strip()
# Ask about Northeast India culture
response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
print(response)
π Training Details
Dataset
- Size: 6,205 cultural Q&A pairs
- Sources: Regional cultural databases, wiki content, expert curation
- Quality: Manually verified for cultural authenticity
- Split: 90% training, 10% validation
Training Configuration
- Hardware: NVIDIA A40 40GB
- Epochs: 5 (enhanced from initial 3)
- Learning Rate: 2e-5 (optimized for detailed responses)
- Batch Size: 8 per device
- Precision: bfloat16
- Max Sequence Length: 512 tokens
Improvements Over Base Model
| Aspect | Base Gemma 3 1B-IT | Neodac-mini |
|---|---|---|
| Cultural Accuracy | β Hallucinations | β Factually correct |
| Response Detail | β οΈ Generic/brief | β Rich & comprehensive |
| Regional Context | β Limited knowledge | β Deep cultural understanding |
| Tribal Information | β Inaccurate/missing | β Authentic representation |
πͺ Example Comparisons
Question: "What is Bihu festival?"
Base Model Response:
Claims Bihu is about Lord Shiva (incorrect)
Neodac-mini Response:
Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.
π― Use Cases
Cultural Education
- Educational institutions teaching Northeast India studies
- Cultural preservation initiatives
- Tourism and travel information
Research & Documentation
- Academic research on regional culture
- Cultural anthropology studies
- Digital heritage preservation
Community Applications
- Cultural chatbots for tourism
- Educational tools for diaspora communities
- Content creation for cultural media
β οΈ Limitations
- Geographic Scope: Specialized for Northeast India only
- Language: Responses in English (cultural terms may be in local languages)
- Temporal Knowledge: Training data has knowledge cutoff
- Bias Inheritance: May inherit biases from base model and training data
π¬ Evaluation & Performance
The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains.
π Citation
If you use Neodac-mini in your research or applications, please cite:
@misc{neodac2025,
title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
author={MWire Labs},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/MWirelabs/neodac-mini},
note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
}
π€ Contributing
Interested in improving Neodac-mini? We welcome:
- Additional cultural data from Northeast India
- Feedback on cultural accuracy
- Suggestions for new cultural domains
- Community validation of responses
π License
This model is released under the Apache 2.0 license, same as the base Gemma model.
π Acknowledgments
- Google for the Gemma 3 1B-IT base model
- Cultural experts and communities of Northeast India
- Contributors to the cultural dataset
- Hugging Face for the platform and tools
Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.
- Downloads last month
- 16