Neodac-mini: Northeast India Cultural AI Model

Neodac-mini (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.

🎯 Model Overview

  • Base Model: google/gemma-3-1b-it
  • Specialization: Northeast India Cultural Knowledge
  • Training Data: 6,205 culturally authentic Q&A pairs
  • Coverage: All 8 Northeast Indian states
  • Languages: English (with cultural context)

🌟 Key Features

Cultural Domains Covered

  • Festivals & Celebrations: Bihu, Hornbill, Losar, Chapchar Kut, etc.
  • Traditional Arts: Dance forms, music, crafts, weaving
  • Cuisine: Regional foods, cooking methods, traditional recipes
  • Tribal Heritage: Community practices, languages, customs
  • Geography: Cultural significance of places and landmarks
  • Literature: Folk tales, oral traditions, regional literature

Model Capabilities

  • βœ… Accurate cultural information without hallucinations
  • βœ… Detailed responses about regional traditions
  • βœ… Authentic representation of tribal communities
  • βœ… Contextual understanding of cultural nuances
  • βœ… Preservation of cultural knowledge through AI

πŸš€ Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
model = AutoModelForCausalLM.from_pretrained(
    "MWirelabs/neodac-mini",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
def ask_neodac-mini(question):
    prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
    inputs = tokenizer(prompt, return_tensors="pt")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=300,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<start_of_turn>model\n")[-1].strip()

# Ask about Northeast India culture
response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
print(response)

πŸ“Š Training Details

Dataset

  • Size: 6,205 cultural Q&A pairs
  • Sources: Regional cultural databases, wiki content, expert curation
  • Quality: Manually verified for cultural authenticity
  • Split: 90% training, 10% validation

Training Configuration

  • Hardware: NVIDIA A40 40GB
  • Epochs: 5 (enhanced from initial 3)
  • Learning Rate: 2e-5 (optimized for detailed responses)
  • Batch Size: 8 per device
  • Precision: bfloat16
  • Max Sequence Length: 512 tokens

Improvements Over Base Model

Aspect Base Gemma 3 1B-IT Neodac-mini
Cultural Accuracy ❌ Hallucinations βœ… Factually correct
Response Detail ⚠️ Generic/brief βœ… Rich & comprehensive
Regional Context ❌ Limited knowledge βœ… Deep cultural understanding
Tribal Information ❌ Inaccurate/missing βœ… Authentic representation

πŸŽͺ Example Comparisons

Question: "What is Bihu festival?"

Base Model Response:

Claims Bihu is about Lord Shiva (incorrect)

Neodac-mini Response:

Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.

🎯 Use Cases

Cultural Education

  • Educational institutions teaching Northeast India studies
  • Cultural preservation initiatives
  • Tourism and travel information

Research & Documentation

  • Academic research on regional culture
  • Cultural anthropology studies
  • Digital heritage preservation

Community Applications

  • Cultural chatbots for tourism
  • Educational tools for diaspora communities
  • Content creation for cultural media

⚠️ Limitations

  • Geographic Scope: Specialized for Northeast India only
  • Language: Responses in English (cultural terms may be in local languages)
  • Temporal Knowledge: Training data has knowledge cutoff
  • Bias Inheritance: May inherit biases from base model and training data

πŸ”¬ Evaluation & Performance

The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains.

πŸ“œ Citation

If you use Neodac-mini in your research or applications, please cite:

@misc{neodac2025,
  title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
  author={MWire Labs},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/MWirelabs/neodac-mini},
  note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
}

🀝 Contributing

Interested in improving Neodac-mini? We welcome:

  • Additional cultural data from Northeast India
  • Feedback on cultural accuracy
  • Suggestions for new cultural domains
  • Community validation of responses

πŸ“„ License

This model is released under the Apache 2.0 license, same as the base Gemma model.

πŸ™ Acknowledgments

  • Google for the Gemma 3 1B-IT base model
  • Cultural experts and communities of Northeast India
  • Contributors to the cultural dataset
  • Hugging Face for the platform and tools

Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.

Downloads last month
16
Safetensors
Model size
1.0B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MWirelabs/neodac-mini

Finetuned
(334)
this model