You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MALIBA-LLM: Bambara Large Language Model [experimental]

MALIBA-LLM is a fine-tuned version of google/gemma-3n-E2B-it, for instruction-following, text generation, and language understanding in Bambara (Bamanankan). As the first open-source Large Language Model for Bambara language spoken by over 15 million people in West Africa, it aims to enhance AI accessibility for Bambara-speaking communities in education, information retrieval, and digital inclusion.

This model supports Bambara with French|English code-switching for technical terms, reflecting natural linguistic patterns. It forms part of the broader MALIBA-AI project to advance AI for Malian languages.

Model Details

  • Base Model: google/gemma-3n-E2B-it (instruction-tuned, multimodal-capable foundation)
  • Adapter: LoRA (Low-Rank Adaptation)
  • Parameters: Effective 2B (compressed via MatFormer architecture)
  • Primary Language: Bambara (Bamanankan)

  • Additional Languages: All languages supported by the Gemma-3n foundation model

  • Context Window: 4,096 tokens

  • Core Capabilities:

    • Instruction following
    • Conversational reasoning
    • Knowledge retrieval
    • Content generation in Bambara
    • Translation (Bambara ↔ French/English)
    • Mathematical reasoning
    • Coding support
    • Logical problem-solving
    • Mali-specific knowledge (history, institutions, administration)
  • License: MIT

Intended Uses

This model is intended for research and development in low-resource NLP, particularly:

  • Generating Bambara-language educational and informational content
  • Enabling conversational AI interfaces for Bambara speakers
  • Supporting preliminary translation tasks involving Bambara
  • Facilitating access to AI tools in underserved West African regions
  • Serving as a base for further fine-tuning in domain-specific applications

This work contributes to a broader global push to ensure low-resource languages are not left behind in the AI era.

Limitations

As an early-stage model for a low-resource language:

-Performance degradation over long conversations, where contextual tracking and coherence may gradually decline

  • Reliance on French code-switching for advanced vocabulary, which may not align with pure Bambara preferences
  • Potential grammatical inconsistencies in longer or intricate outputs
  • Inherited biases from base model and training data; lacks Bambara-specific safeguards
  • Experimental nature requires output verification for practical use

Despite these limitations, the model retains all core capabilities of Gemma-3n, with additional instruction-following and conversational strength in Bambara.

Training Data

The model was fine-tuned on a cleaned subset of the MALIBA-Instructions dataset (1M examples). For comprehensive details on the dataset, including sources and preparation, please refer to the repository.

Evaluation

Evaluated on a 1% validation split, the model achieved a final validation loss of 0.4952 (93.4% reduction from initial 7.4595). Human assessments by native speakers indicate reasonable quality in conversational and knowledge-based tasks.

Bambara LLM

Training Procedure

Supervised fine-tuning was conducted with distributed training across 8 devices, peaking at 57.85 GiB memory usage.

Hyperparameters

CONFIGURATIONS

axolotl version: 0.12.2

base_model: google/gemma-3n-E2B-it
hub_model_id: sudoping01/bambara-llm-exp3 
plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true
load_in_4bit: false  
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
ddp: true
chat_template: gemma3n
eot_tokens:
  - <end_of_turn>
special_tokens:
  eot_token: <end_of_turn>
datasets:
  - path: sudoping01/bambara-instructions
    type: chat_template
    split: train
    name: cleaned
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
val_set_size: 0.01
output_dir: ./outputs/bambara-gemma3n-lora-exp4
adapter: lora  
lora_r: 64     
lora_alpha: 128 
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'
sequence_len: 4096 
sample_packing: false
pad_to_sequence_len: false
micro_batch_size: 8  
gradient_accumulation_steps: 2
num_epochs: 3  
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 1.2e-4  
warmup_ratio: 0.03
weight_decay: 0.01
bf16: auto
tf32: false
logging_steps: 10
saves_per_epoch: 2  
evals_per_epoch: 2

Training Results

Epoch Step Training Loss Validation Loss Memory (GiB)
0 0 - 7.4595 19.86
0.5 3521 0.8265 0.7787 57.85
1.0 7042 0.7107 0.6745 57.85
1.5 10563 0.6363 0.6026 57.85
2.0 14084 0.5421 0.5429 57.85
2.5 17605 0.5733 0.5039 57.85
3.0 21126 0.5401 0.4952 57.85

Usage

Loading

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "sudoping01/bambara-llm-exp3"
config = PeftConfig.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, model_name)

Inference

Apply the Gemma chat template:

messages = [
    {"role": "user", "content": "I ni ce! I ka kɛnɛ wa?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=1.0,
    top_p=0.9,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Ethical Considerations

This model promotes AI equity for low-resource languages but demands responsible use:

  • Cultural Respect: Outputs may not fully reflect all Bambara dialects or nuances; verify with native speakers.
  • Bias Awareness: Potential propagation of source data biases; not for sensitive or decision-making applications without oversight.
  • Accessibility Barriers: Computational requirements may limit deployment in target regions.
  • Misuse Prevention: Ensure applications align with community needs and ethical standards.

Additional Information

This model contributes to the MALIBA project for African AI. Source code and pipelines: GitHub. Future enhancements include multimodal integration as outlined in related research.

Citation

@article{diallo2025bambara,
  title={Bambara Large Language Model},
  author={Diallo, Seydou},
  journal={Unpublished manuscript},
  year={2025},
  month={July}
}

Framework Versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • PyTorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sudoping01/maliba-llm

Adapter
(5)
this model
Adapters
2 models

Dataset used to train sudoping01/maliba-llm