likhonsheikh commited on
Commit
f3df02a
ยท
verified ยท
1 Parent(s): 7dabbde

๐Ÿš€ Automatic deploy

Browse files
Files changed (2) hide show
  1. README.md +269 -0
  2. model-index.json +38 -0
README.md ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ๐Ÿค– Meena - Enterprise AI Pipeline
2
+
3
+ [![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/sheikh-vegeta/Meena/auto-train-publish.yml?branch=main&style=flat-square&logo=github&label=CI%2FCD)](https://github.com/sheikh-vegeta/Meena/actions)
4
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue?style=flat-square&logo=python)](https://python.org)
5
+ [![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-Models-yellow?style=flat-square)](https://huggingface.co)
6
+ [![License](https://img.shields.io/github/license/sheikh-vegeta/Meena?style=flat-square)](https://github.com/sheikh-vegeta/Meena/blob/main/LICENSE)
7
+ [![GitHub stars](https://img.shields.io/github/stars/sheikh-vegeta/Meena?style=flat-square)](https://github.com/sheikh-vegeta/Meena/stargazers)
8
+
9
+ <div align="center">
10
+
11
+ **๐ŸŒ An enterprise-grade CI/CD pipeline for training, benchmarking, and deploying the Meena conversational AI model**
12
+
13
+ *Designed with automation, efficiency, and multilingual support (Bengali & English) at its core*
14
+
15
+ ---
16
+
17
+ > *"เฆฎเฆพเฆจเฆฌเฆฟเฆ• เฆธเฆ‚เฆฒเฆพเฆช, เฆฌเฆพเฆ‚เฆฒเฆพเฆฐ เฆ›เง‹เฆเฆฏเฆผเฆพเฆฏเฆผ โ€“ Building the future of multilingual conversational AI"*
18
+
19
+ </div>
20
+
21
+ ## โœจ Features
22
+
23
+ ๐Ÿš€ **Enterprise-Ready Pipeline**
24
+ - โš™๏ธ **Automated CI/CD** โ€“ End-to-end automation with GitHub Actions
25
+ - ๐Ÿ” **Smart Change Detection** โ€“ Runs only the jobs affected by your commits
26
+ - ๐Ÿ”„ **Multi-environment Support** โ€“ Main, develop, and feature branch workflows
27
+
28
+ ๐Ÿง  **Advanced AI Training**
29
+ - ๐ŸŽฏ **LoRA-based Fine-tuning** โ€“ Efficient training on models like DialoGPT
30
+ - ๐Ÿ“Š **Integrated Benchmarking** โ€“ Performance evaluation with comprehensive metrics
31
+ - ๐ŸŒ **Multilingual Support** โ€“ Bengali (เฆฌเฆพเฆ‚เฆฒเฆพ) and English training datasets
32
+
33
+ ๐Ÿ“ฆ **Professional Deployment**
34
+ - ๐Ÿค— **Hugging Face Integration** โ€“ Automatic model publishing and versioning
35
+ - ๐Ÿ“ **Auto-generated Model Cards** โ€“ Detailed documentation for every model release
36
+ - ๐Ÿš€ **GitHub Releases** โ€“ Automated versioning and artifact management
37
+ - ๐Ÿ”” **Smart Notifications** โ€“ Keep your team updated on pipeline status
38
+
39
+ ## ๐Ÿ› ๏ธ Architecture Overview
40
+
41
+ ```mermaid
42
+ flowchart TD
43
+ A[๐Ÿ”„ Push to Repository] --> B[๐Ÿ•ต๏ธ Change Detection]
44
+ B --> C{๐Ÿ“ Changes Detected?}
45
+ C -->|Training Scripts| D[๐ŸŽ“ Model Training]
46
+ C -->|Benchmark Scripts| E[๐Ÿ“ˆ Performance Evaluation]
47
+ C -->|Config Changes| F[โš™๏ธ Pipeline Update]
48
+
49
+ D --> G[๐Ÿ“Š Training Metrics]
50
+ E --> H[๐Ÿ“ˆ Benchmark Results]
51
+
52
+ G --> I[๐Ÿš€ Model Publishing]
53
+ H --> I
54
+ I --> J[๐Ÿค— Hugging Face Hub]
55
+ I --> K[๐Ÿ“ฆ GitHub Release]
56
+
57
+ J --> L[๐Ÿงช Inference Testing]
58
+ K --> L
59
+ L --> M[โœ… Quality Gates]
60
+ M --> N[๐Ÿ”” Team Notification]
61
+
62
+ style A fill:#e1f5fe
63
+ style D fill:#f3e5f5
64
+ style E fill:#fff3e0
65
+ style I fill:#e8f5e8
66
+ style N fill:#fce4ec
67
+ ```
68
+
69
+ ## ๐Ÿš€ Quick Start
70
+
71
+ ### Prerequisites
72
+ ```bash
73
+ # Python 3.8+ required
74
+ python --version
75
+
76
+ # Install dependencies
77
+ pip install -r requirements.txt
78
+ ```
79
+
80
+ ### Local Development
81
+ ```bash
82
+ # Clone the repository
83
+ git clone https://github.com/sheikh-vegeta/Meena.git
84
+ cd Meena
85
+
86
+ # Set up environment
87
+ python -m venv meena-env
88
+ source meena-env/bin/activate # On Windows: meena-env\Scripts\activate
89
+
90
+ # Install requirements
91
+ pip install -r requirements.txt
92
+
93
+ # Run training locally
94
+ python train.py
95
+
96
+ # Run benchmarking
97
+ python benchmark.py
98
+ ```
99
+
100
+ ## ๐Ÿ“‹ CI/CD Pipeline Details
101
+
102
+ The complete automation logic is defined in `.github/workflows/auto-train-publish.yml`
103
+
104
+ ### ๐ŸŽฏ Trigger Conditions
105
+ - โœ… **Push to `main`** โ€“ Full pipeline execution
106
+ - โœ… **Push to `develop`** โ€“ Training and benchmarking only
107
+ - โœ… **Pull Requests** โ€“ Validation and testing
108
+ - โœ… **Manual Dispatch** โ€“ On-demand execution via GitHub Actions
109
+
110
+ ### ๐Ÿ”„ Pipeline Jobs
111
+
112
+ | Job | Description | Triggers |
113
+ |-----|-------------|----------|
114
+ | ๐Ÿ•ต๏ธ **detect-changes** | Analyzes git diff to determine required pipeline stages | Always |
115
+ | ๐ŸŽ“ **train** | Executes model training with LoRA fine-tuning | Training scripts modified |
116
+ | ๐Ÿ“ˆ **benchmark** | Runs performance evaluation and generates metrics | Model or benchmark changes |
117
+ | ๐Ÿš€ **publish** | Publishes to Hugging Face Hub & creates GitHub release | Successful training completion |
118
+ | ๐Ÿงช **test** | Validates deployed model via Inference API | Post-deployment |
119
+ | ๐Ÿ”” **notify** | Sends pipeline status to configured channels | Pipeline completion |
120
+
121
+ ## ๐Ÿ“Š Benchmarking & Metrics
122
+
123
+ Meena includes comprehensive evaluation metrics:
124
+
125
+ - ๐Ÿ“ˆ **Perplexity Scores** โ€“ Language model quality assessment
126
+ - ๐ŸŽฏ **BLEU Scores** โ€“ Translation and generation quality
127
+ - ๐Ÿ—ฃ๏ธ **Conversational Metrics** โ€“ Dialogue coherence and relevance
128
+ - โšก **Performance Benchmarks** โ€“ Inference speed and memory usage
129
+
130
+ > *เฆฌเฆพเฆ‚เฆฒเฆพ เฆฎเง‡เฆŸเงเฆฐเฆฟเฆ•เงเฆธ:* "เฆ†เฆฎเฆพเฆฆเง‡เฆฐ เฆฌเง‡เฆžเงเฆšเฆฎเฆพเฆฐเงเฆ•เฆฟเฆ‚ ๏ฟฝ๏ฟฝเฆฟเฆธเงเฆŸเง‡เฆฎ เฆฌเฆพเฆ‚เฆฒเฆพ เฆญเฆพเฆทเฆพเฆฐ เฆœเฆจเงเฆฏ เฆฌเฆฟเฆถเง‡เฆทเฆญเฆพเฆฌเง‡ เฆ…เฆชเงเฆŸเฆฟเฆฎเฆพเฆ‡เฆœ เฆ•เฆฐเฆพ เฆนเฆฏเฆผเง‡เฆ›เง‡เฅค"
131
+
132
+ ## ๐ŸŒ Multilingual Support
133
+
134
+ ### Bengali (เฆฌเฆพเฆ‚เฆฒเฆพ) Integration
135
+ - ๐Ÿ“š **Native Bengali Datasets** โ€“ Curated conversational data
136
+ - ๐Ÿ”ค **Proper Tokenization** โ€“ Bengali script-aware processing
137
+ - ๐ŸŽญ **Cultural Context** โ€“ Bengali idioms and expressions
138
+ - โœ… **Quality Assurance** โ€“ Bengali-specific evaluation metrics
139
+
140
+ ### Training Data Structure
141
+ ```
142
+ datasets/
143
+ โ”œโ”€โ”€ bengali/
144
+ โ”‚ โ”œโ”€โ”€ conversations.json
145
+ โ”‚ โ”œโ”€โ”€ formal_dialogues.json
146
+ โ”‚ โ””โ”€โ”€ casual_chat.json
147
+ โ”œโ”€โ”€ english/
148
+ โ”‚ โ”œโ”€โ”€ dialogpt_data.json
149
+ โ”‚ โ””โ”€โ”€ general_conversations.json
150
+ โ””โ”€โ”€ mixed/
151
+ โ””โ”€โ”€ bilingual_pairs.json
152
+ ```
153
+
154
+ ## ๐Ÿš€ Model Publishing Workflow
155
+
156
+ ### Automatic Publishing
157
+ 1. ๐ŸŽฏ **Training Completion** โ€“ Model artifacts generated
158
+ 2. ๐Ÿ“ **Model Card Generation** โ€“ Documentation created automatically
159
+ 3. ๐Ÿค— **Hugging Face Upload** โ€“ Model pushed to Hub with versioning
160
+ 4. ๐Ÿ“ฆ **GitHub Release** โ€“ Tagged release with artifacts
161
+ 5. ๐Ÿงช **Validation Testing** โ€“ Inference API smoke tests
162
+
163
+ ### Model Card Features
164
+ - ๐Ÿ“Š **Performance Metrics** โ€“ Comprehensive benchmark results
165
+ - ๐ŸŽฏ **Use Cases** โ€“ Detailed application scenarios
166
+ - โš ๏ธ **Limitations** โ€“ Honest assessment of model boundaries
167
+ - ๐Ÿ“œ **Training Details** โ€“ Complete training configuration
168
+ - ๐ŸŒ **Language Support** โ€“ Bengali and English capabilities
169
+
170
+ ## ๐Ÿ›ก๏ธ Quality Assurance
171
+
172
+ ### Automated Testing
173
+ - โœ… **Unit Tests** โ€“ Core functionality validation
174
+ - ๐Ÿ”„ **Integration Tests** โ€“ End-to-end pipeline verification
175
+ - ๐Ÿงช **Model Validation** โ€“ Output quality assessment
176
+ - ๐Ÿ“ˆ **Performance Regression** โ€“ Benchmark comparison
177
+
178
+ ### Code Quality
179
+ - ๐Ÿ” **Linting** โ€“ PEP 8 compliance with flake8
180
+ - ๐ŸŽจ **Formatting** โ€“ Automatic formatting with black
181
+ - ๐Ÿ“š **Documentation** โ€“ Comprehensive docstrings
182
+ - ๐Ÿ”’ **Security Scanning** โ€“ Dependency vulnerability checks
183
+
184
+ ## ๐Ÿ”” Notification System
185
+
186
+ Stay updated with intelligent notifications:
187
+
188
+ - ๐Ÿ“ง **Email Alerts** โ€“ Critical pipeline failures
189
+ - ๐Ÿ’ฌ **Slack Integration** โ€“ Team channel updates
190
+ - ๐Ÿšจ **Discord Webhooks** โ€“ Community notifications
191
+ - ๐Ÿ“ฑ **GitHub Notifications** โ€“ Built-in issue tracking
192
+
193
+ ## ๐Ÿค Contributing
194
+
195
+ We welcome contributions! Please follow these guidelines:
196
+
197
+ ### Development Workflow
198
+ 1. ๐Ÿด **Fork** the repository
199
+ 2. ๐ŸŒฟ **Create** a feature branch (`git checkout -b feature/amazing-feature`)
200
+ 3. โœ… **Test** your changes thoroughly
201
+ 4. ๐Ÿ“ **Commit** with descriptive messages
202
+ 5. ๐Ÿš€ **Push** to your branch
203
+ 6. ๐Ÿ“ฌ **Open** a Pull Request
204
+
205
+ ### Contribution Areas
206
+ - ๐Ÿง  **Model Improvements** โ€“ Better architectures and training techniques
207
+ - ๐ŸŒ **Language Support** โ€“ Additional language integrations
208
+ - ๐Ÿ“Š **Benchmarking** โ€“ New evaluation metrics and datasets
209
+ - ๐Ÿ”ง **Infrastructure** โ€“ Pipeline optimizations and tooling
210
+ - ๐Ÿ“š **Documentation** โ€“ Tutorials, guides, and examples
211
+
212
+ > *เฆ…เฆฌเฆฆเฆพเฆจเฆ•เฆพเฆฐเง€เฆฆเง‡เฆฐ เฆœเฆจเงเฆฏ:* "เฆ†เฆชเฆจเฆพเฆฐ เฆ…เฆฌเฆฆเฆพเฆจ เฆฌเฆพเฆ‚เฆฒเฆพ AI-เฆเฆฐ เฆญเฆฌเฆฟเฆทเงเฆฏเงŽ เฆ—เฆกเฆผเฆคเง‡ เฆธเฆพเฆนเฆพเฆฏเงเฆฏ เฆ•เฆฐเฆฌเง‡เฅค"
213
+
214
+ ## ๐Ÿ“ Repository Structure
215
+
216
+ ```
217
+ Meena/
218
+ โ”œโ”€โ”€ ๐Ÿ“„ README.md # This file
219
+ โ”œโ”€โ”€ โš™๏ธ requirements.txt # Python dependencies
220
+ โ”œโ”€โ”€ ๐Ÿงช train.py # Model training script
221
+ โ”œโ”€โ”€ ๐Ÿ“Š benchmark.py # Performance evaluation
222
+ โ”œโ”€โ”€ ๐Ÿ“ generate_model_card.py # Documentation generation
223
+ โ”œโ”€โ”€ ๐Ÿ”ง config/
224
+ โ”‚ โ”œโ”€โ”€ training_config.yaml # Training parameters
225
+ โ”‚ โ”œโ”€โ”€ model_config.yaml # Model architecture
226
+ โ”‚ โ””โ”€โ”€ benchmark_config.yaml # Evaluation settings
227
+ โ”œโ”€โ”€ ๐Ÿ“š datasets/
228
+ โ”‚ โ”œโ”€โ”€ bengali/ # Bengali training data
229
+ โ”‚ โ”œโ”€โ”€ english/ # English training data
230
+ โ”‚ โ””โ”€โ”€ mixed/ # Bilingual datasets
231
+ โ”œโ”€โ”€ ๐ŸŽฏ models/
232
+ โ”‚ โ”œโ”€โ”€ base/ # Base model checkpoints
233
+ โ”‚ โ””โ”€โ”€ fine_tuned/ # Fine-tuned outputs
234
+ โ”œโ”€โ”€ ๐Ÿ“ˆ benchmarks/
235
+ โ”‚ โ”œโ”€โ”€ results/ # Benchmark outputs
236
+ โ”‚ โ””โ”€โ”€ metrics/ # Performance data
237
+ โ””โ”€โ”€ ๐Ÿ”„ .github/
238
+ โ””โ”€โ”€ workflows/
239
+ โ””โ”€โ”€ auto-train-publish.yml # CI/CD pipeline
240
+ ```
241
+
242
+ ## ๐Ÿ“œ License
243
+
244
+ This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for details.
245
+
246
+ ## ๐ŸŒŸ Acknowledgments
247
+
248
+ - ๐Ÿค— **Hugging Face** โ€“ For the incredible transformers library and model hub
249
+ - ๐Ÿง  **OpenAI** โ€“ For inspiring conversational AI research
250
+ - ๐ŸŒ **Bengali NLP Community** โ€“ For dataset contributions and feedback
251
+ - ๐Ÿ‘ฅ **Contributors** โ€“ Everyone who has helped improve Meena
252
+
253
+ ## ๐Ÿ“ž Support & Contact
254
+
255
+ - ๐Ÿ“ง **Issues**: [GitHub Issues](https://github.com/sheikh-vegeta/Meena/issues)
256
+ - ๐Ÿ’ฌ **Discussions**: [GitHub Discussions](https://github.com/sheikh-vegeta/Meena/discussions)
257
+ - ๐ŸŒ **Documentation**: [Wiki](https://github.com/sheikh-vegeta/Meena/wiki)
258
+
259
+ ---
260
+
261
+ <div align="center">
262
+
263
+ **๐Ÿ”ฎ Building the Future of Multilingual Conversational AI**
264
+
265
+ *Made with โค๏ธ by the Meena Team*
266
+
267
+ **[โญ Star this repo](https://github.com/sheikh-vegeta/Meena) if you found it helpful!**
268
+
269
+ </div>
model-index.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "language": ["en", "bn"],
3
+ "license": "apache-2.0",
4
+ "tags": [
5
+ "meena",
6
+ "conversational-ai",
7
+ "multilingual",
8
+ "bengali",
9
+ "fine-tuned",
10
+ "enterprise-pipeline"
11
+ ],
12
+ "pipeline_tag": "text-generation",
13
+ "library_name": "transformers",
14
+ "model-index": [
15
+ {
16
+ "name": "Meena",
17
+ "results": [
18
+ {
19
+ "model": {
20
+ "name": "likhonsheikh/Meena"
21
+ },
22
+ "metrics": [
23
+ {
24
+ "name": "Perplexity",
25
+ "type": "perplexity",
26
+ "value": "N/A"
27
+ }
28
+ ],
29
+ "tasks": [
30
+ {
31
+ "type": "text-generation"
32
+ }
33
+ ]
34
+ }
35
+ ]
36
+ }
37
+ ]
38
+ }