Model Card: lily211/moe-llm-127m

Model Details

Model Description

lily211/moe-llm-127m is a small-scale Mixture of Experts (MoE) language model trained for experimentation and research purposes.
It is based on a transformer architecture with sparsely activated experts to reduce memory and compute requirements while still achieving competitive performance for its size.

  • Developed by: lily211
  • Model type: Mixture-of-Experts Language Model
  • Parameters: ~127M (base + experts, sparsely activated)
  • Language(s): English (primary)
  • License: MIT (default โ€” please confirm if different)
  • Finetuned from: Custom architecture (inspired by GPT-2/decoder-only transformers)

Model Sources


Uses

Direct Use

  • Text generation (English)
  • Research in efficient architectures (MoE scaling)
  • Educational experiments in training & inference with MoE models

Downstream Use

  • Fine-tuning on domain-specific datasets (e.g., instruction-tuning, Q&A, dialogue)
  • Distillation into smaller dense models
  • Experimentation with MoE routing strategies

Out-of-Scope Use

  • Production-ready deployment in safety-critical applications
  • Factual knowledge retrieval or reasoning-intensive tasks
  • Use in sensitive domains without additional fine-tuning, evaluation, and safety checks

Bias, Risks, and Limitations

  • Trained primarily on open web-style text; may reflect biases and stereotypes.
  • Limited knowledge scope compared to larger LLMs.
  • May generate hallucinations, incoherent responses, or unsafe content.

Recommendations

  • Always human-review outputs before downstream use.
  • Do not rely on this model for factual accuracy without verification.
  • For safety-sensitive domains, prefer larger, audited LLMs.

How to Get Started with the Model

Example usage with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "lily211/moe-llm-127m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("The bluebells are blooming", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
209
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support