likhonsheikh
/

Meena

Model card Files Files and versions

xet

Community

likhonsheikh commited on Sep 16

Commit

f3df02a

verified ·

1 Parent(s): 7dabbde

🚀 Automatic deploy

Browse files

Files changed (2) hide show

README.md +269 -0
model-index.json +38 -0

README.md ADDED Viewed

	@@ -0,0 +1,269 @@

+# 🤖 Meena - Enterprise AI Pipeline
+[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/sheikh-vegeta/Meena/auto-train-publish.yml?branch=main&style=flat-square&logo=github&label=CI%2FCD)](https://github.com/sheikh-vegeta/Meena/actions)
+[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue?style=flat-square&logo=python)](https://python.org)
+[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-yellow?style=flat-square)](https://huggingface.co)
+[![License](https://img.shields.io/github/license/sheikh-vegeta/Meena?style=flat-square)](https://github.com/sheikh-vegeta/Meena/blob/main/LICENSE)
+[![GitHub stars](https://img.shields.io/github/stars/sheikh-vegeta/Meena?style=flat-square)](https://github.com/sheikh-vegeta/Meena/stargazers)
+<div align="center">
+**🌐 An enterprise-grade CI/CD pipeline for training, benchmarking, and deploying the Meena conversational AI model**
+*Designed with automation, efficiency, and multilingual support (Bengali & English) at its core*
+---
+> *"মানবিক সংলাপ, বাংলার ছোঁয়ায় – Building the future of multilingual conversational AI"*
+</div>
+## ✨ Features
+🚀 **Enterprise-Ready Pipeline**
+- ⚙️ **Automated CI/CD** – End-to-end automation with GitHub Actions
+- 🔍 **Smart Change Detection** – Runs only the jobs affected by your commits
+- 🔄 **Multi-environment Support** – Main, develop, and feature branch workflows
+🧠 **Advanced AI Training**
+- 🎯 **LoRA-based Fine-tuning** – Efficient training on models like DialoGPT
+- 📊 **Integrated Benchmarking** – Performance evaluation with comprehensive metrics
+- 🌍 **Multilingual Support** – Bengali (বাংলা) and English training datasets
+📦 **Professional Deployment**
+- 🤗 **Hugging Face Integration** – Automatic model publishing and versioning
+- 📝 **Auto-generated Model Cards** – Detailed documentation for every model release
+- 🚀 **GitHub Releases** – Automated versioning and artifact management
+- 🔔 **Smart Notifications** – Keep your team updated on pipeline status
+## 🛠️ Architecture Overview
+```mermaid
+flowchart TD
+    A[🔄 Push to Repository] --> B[🕵️ Change Detection]
+    B --> C{📝 Changes Detected?}
+    C -->|Training Scripts| D[🎓 Model Training]
+    C -->|Benchmark Scripts| E[📈 Performance Evaluation]
+    C -->|Config Changes| F[⚙️ Pipeline Update]
+    D --> G[📊 Training Metrics]
+    E --> H[📈 Benchmark Results]
+    G --> I[🚀 Model Publishing]
+    H --> I
+    I --> J[🤗 Hugging Face Hub]
+    I --> K[📦 GitHub Release]
+    J --> L[🧪 Inference Testing]
+    K --> L
+    L --> M[✅ Quality Gates]
+    M --> N[🔔 Team Notification]
+    style A fill:#e1f5fe
+    style D fill:#f3e5f5
+    style E fill:#fff3e0
+    style I fill:#e8f5e8
+    style N fill:#fce4ec
+```
+## 🚀 Quick Start
+### Prerequisites
+```bash
+# Python 3.8+ required
+python --version
+# Install dependencies
+pip install -r requirements.txt
+```
+### Local Development
+```bash
+# Clone the repository
+git clone https://github.com/sheikh-vegeta/Meena.git
+cd Meena
+# Set up environment
+python -m venv meena-env
+source meena-env/bin/activate  # On Windows: meena-env\Scripts\activate
+# Install requirements
+pip install -r requirements.txt
+# Run training locally
+python train.py
+# Run benchmarking
+python benchmark.py
+```
+## 📋 CI/CD Pipeline Details
+The complete automation logic is defined in `.github/workflows/auto-train-publish.yml`
+### 🎯 Trigger Conditions
+- ✅ **Push to `main`** – Full pipeline execution
+- ✅ **Push to `develop`** – Training and benchmarking only
+- ✅ **Pull Requests** – Validation and testing
+- ✅ **Manual Dispatch** – On-demand execution via GitHub Actions
+### 🔄 Pipeline Jobs
+| Job | Description | Triggers |
+|-----|-------------|----------|
+| 🕵️ **detect-changes** | Analyzes git diff to determine required pipeline stages | Always |
+| 🎓 **train** | Executes model training with LoRA fine-tuning | Training scripts modified |
+| 📈 **benchmark** | Runs performance evaluation and generates metrics | Model or benchmark changes |
+| 🚀 **publish** | Publishes to Hugging Face Hub & creates GitHub release | Successful training completion |
+| 🧪 **test** | Validates deployed model via Inference API | Post-deployment |
+| 🔔 **notify** | Sends pipeline status to configured channels | Pipeline completion |
+## 📊 Benchmarking & Metrics
+Meena includes comprehensive evaluation metrics:
+- 📈 **Perplexity Scores** – Language model quality assessment
+- 🎯 **BLEU Scores** – Translation and generation quality
+- 🗣️ **Conversational Metrics** – Dialogue coherence and relevance
+- ⚡ **Performance Benchmarks** – Inference speed and memory usage
+> *বাংলা মেট্রিক্স:* "আমাদের বেঞ্চমার্কিং ��িস্টেম বাংলা ভাষার জন্য বিশেষভাবে অপ্টিমাইজ করা হয়েছে।"
+## 🌍 Multilingual Support
+### Bengali (বাংলা) Integration
+- 📚 **Native Bengali Datasets** – Curated conversational data
+- 🔤 **Proper Tokenization** – Bengali script-aware processing
+- 🎭 **Cultural Context** – Bengali idioms and expressions
+- ✅ **Quality Assurance** – Bengali-specific evaluation metrics
+### Training Data Structure
+```
+datasets/
+├── bengali/
+│   ├── conversations.json
+│   ├── formal_dialogues.json
+│   └── casual_chat.json
+├── english/
+│   ├── dialogpt_data.json
+│   └── general_conversations.json
+└── mixed/
+    └── bilingual_pairs.json
+```
+## 🚀 Model Publishing Workflow
+### Automatic Publishing
+1. 🎯 **Training Completion** – Model artifacts generated
+2. 📝 **Model Card Generation** – Documentation created automatically
+3. 🤗 **Hugging Face Upload** – Model pushed to Hub with versioning
+4. 📦 **GitHub Release** – Tagged release with artifacts
+5. 🧪 **Validation Testing** – Inference API smoke tests
+### Model Card Features
+- 📊 **Performance Metrics** – Comprehensive benchmark results
+- 🎯 **Use Cases** – Detailed application scenarios
+- ⚠️ **Limitations** – Honest assessment of model boundaries
+- 📜 **Training Details** – Complete training configuration
+- 🌐 **Language Support** – Bengali and English capabilities
+## 🛡️ Quality Assurance
+### Automated Testing
+- ✅ **Unit Tests** – Core functionality validation
+- 🔄 **Integration Tests** – End-to-end pipeline verification
+- 🧪 **Model Validation** – Output quality assessment
+- 📈 **Performance Regression** – Benchmark comparison
+### Code Quality
+- 🔍 **Linting** – PEP 8 compliance with flake8
+- 🎨 **Formatting** – Automatic formatting with black
+- 📚 **Documentation** – Comprehensive docstrings
+- 🔒 **Security Scanning** – Dependency vulnerability checks
+## 🔔 Notification System
+Stay updated with intelligent notifications:
+- 📧 **Email Alerts** – Critical pipeline failures
+- 💬 **Slack Integration** – Team channel updates
+- 🚨 **Discord Webhooks** – Community notifications
+- 📱 **GitHub Notifications** – Built-in issue tracking
+## 🤝 Contributing
+We welcome contributions! Please follow these guidelines:
+### Development Workflow
+1. 🍴 **Fork** the repository
+2. 🌿 **Create** a feature branch (`git checkout -b feature/amazing-feature`)
+3. ✅ **Test** your changes thoroughly
+4. 📝 **Commit** with descriptive messages
+5. 🚀 **Push** to your branch
+6. 📬 **Open** a Pull Request
+### Contribution Areas
+- 🧠 **Model Improvements** – Better architectures and training techniques
+- 🌐 **Language Support** – Additional language integrations
+- 📊 **Benchmarking** – New evaluation metrics and datasets
+- 🔧 **Infrastructure** – Pipeline optimizations and tooling
+- 📚 **Documentation** – Tutorials, guides, and examples
+> *অবদানকারীদের জন্য:* "আপনার অবদান বাংলা AI-এর ভবিষ্যৎ গড়তে সাহায্য করবে।"
+## 📁 Repository Structure
+```
+Meena/
+├── 📄 README.md                    # This file
+├── ⚙️ requirements.txt            # Python dependencies
+├── 🧪 train.py                    # Model training script
+├── 📊 benchmark.py                # Performance evaluation
+├── 📝 generate_model_card.py      # Documentation generation
+├── 🔧 config/
+│   ├── training_config.yaml       # Training parameters
+│   ├── model_config.yaml          # Model architecture
+│   └── benchmark_config.yaml      # Evaluation settings
+├── 📚 datasets/
+│   ├── bengali/                   # Bengali training data
+│   ├── english/                   # English training data
+│   └── mixed/                     # Bilingual datasets
+├── 🎯 models/
+│   ├── base/                      # Base model checkpoints
+│   └── fine_tuned/               # Fine-tuned outputs
+├── 📈 benchmarks/
+│   ├── results/                   # Benchmark outputs
+│   └── metrics/                  # Performance data
+└── 🔄 .github/
+    └── workflows/
+        └── auto-train-publish.yml # CI/CD pipeline
+```
+## 📜 License
+This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for details.
+## 🌟 Acknowledgments
+- 🤗 **Hugging Face** – For the incredible transformers library and model hub
+- 🧠 **OpenAI** – For inspiring conversational AI research
+- 🌐 **Bengali NLP Community** – For dataset contributions and feedback
+- 👥 **Contributors** – Everyone who has helped improve Meena
+## 📞 Support & Contact
+- 📧 **Issues**: [GitHub Issues](https://github.com/sheikh-vegeta/Meena/issues)
+- 💬 **Discussions**: [GitHub Discussions](https://github.com/sheikh-vegeta/Meena/discussions)
+- 🌐 **Documentation**: [Wiki](https://github.com/sheikh-vegeta/Meena/wiki)
+---
+<div align="center">
+**🔮 Building the Future of Multilingual Conversational AI**
+*Made with ❤️ by the Meena Team*
+**[⭐ Star this repo](https://github.com/sheikh-vegeta/Meena) if you found it helpful!**
+</div>

model-index.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "language": ["en", "bn"],
+  "license": "apache-2.0",
+  "tags": [
+    "meena",
+    "conversational-ai",
+    "multilingual",
+    "bengali",
+    "fine-tuned",
+    "enterprise-pipeline"
+  ],
+  "pipeline_tag": "text-generation",
+  "library_name": "transformers",
+  "model-index": [
+    {
+      "name": "Meena",
+      "results": [
+        {
+          "model": {
+            "name": "likhonsheikh/Meena"
+          },
+          "metrics": [
+            {
+              "name": "Perplexity",
+              "type": "perplexity",
+              "value": "N/A"
+            }
+          ],
+          "tasks": [
+            {
+              "type": "text-generation"
+            }
+          ]
+        }
+      ]
+    }
+  ]
+}