Model Card: lily211/moe-llm-127m

Model Details

Model Description

lily211/moe-llm-127m is a small-scale Mixture of Experts (MoE) language model trained for experimentation and research purposes.
It is based on a transformer architecture with sparsely activated experts to reduce memory and compute requirements while still achieving competitive performance for its size.

Developed by: lily211
Model type: Mixture-of-Experts Language Model
Parameters: ~127M (base + experts, sparsely activated)
Language(s): English (primary)
License: MIT (default — please confirm if different)
Finetuned from: Custom architecture (inspired by GPT-2/decoder-only transformers)

Model Sources

Repository: Hugging Face Model Hub

Uses

Direct Use

Text generation (English)
Research in efficient architectures (MoE scaling)
Educational experiments in training & inference with MoE models

Downstream Use

Fine-tuning on domain-specific datasets (e.g., instruction-tuning, Q&A, dialogue)
Distillation into smaller dense models
Experimentation with MoE routing strategies

Out-of-Scope Use

Production-ready deployment in safety-critical applications
Factual knowledge retrieval or reasoning-intensive tasks
Use in sensitive domains without additional fine-tuning, evaluation, and safety checks

Bias, Risks, and Limitations

Trained primarily on open web-style text; may reflect biases and stereotypes.
Limited knowledge scope compared to larger LLMs.
May generate hallucinations, incoherent responses, or unsafe content.

Recommendations

Always human-review outputs before downstream use.
Do not rely on this model for factual accuracy without verification.
For safety-sensitive domains, prefer larger, audited LLMs.

How to Get Started with the Model

Example usage with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "lily211/moe-llm-127m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("The bluebells are blooming", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 209

Safetensors

Model size

0.1B params

Tensor type

F32