Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.0.0
π Training Guide
Problem
The mutex lock error [mutex.cc : 452] RAW: Lock blocking... happens because:
- HuggingFace Trainer API tries to use multiprocessing
- macOS doesn't handle multiprocessing from tokenizers well
- Environment variables alone aren't enough to fix it completely
Solution
β BEST: Use the Simple Training Script (Recommended)
The simple training script avoids the Trainer API entirely:
python scripts/run_train_simple.py
What it does:
- β No multiprocessing
- β No threading issues
- β Direct PyTorch training loop
- β Works on macOS
- β Same results as Trainer API
Output:
- Trains for 2 epochs
- Shows progress with tqdm
- Saves model to
models/ai_detector
Alternative: Shell Script
bash train_macos.sh
This sets all environment variables and runs the simple script.
If You Still Get Errors
Option 1: Reduce to Tiny Dataset
python scripts/sample_dataset.py data/ai_vs_human_text.csv data/tiny.csv -n 100
# Then edit configs/default.yaml:
# data_path: data/tiny.csv
python scripts/run_train.py
Option 2: Run Outside venv
# Exit your virtualenv
deactivate
# Install system-wide
pip install --user -r requirements.txt
# Train
python scripts/run_train_simple.py
Option 3: Use Colab/Cloud
If nothing works locally, train on Google Colab (free GPU):
- Upload your data to Google Drive
- Use the Colab notebook template
- Much faster training
Key Differences
Simple Script (run_train_simple.py)
- β No Trainer API (no multiprocessing issues)
- β Works on macOS
- β Good for small-medium datasets
- β οΈ Less efficient on large datasets
Standard Script (run_train.py)
- Uses HuggingFace Trainer API
- β Optimized for large datasets
- β οΈ Multiprocessing issues on macOS
Recommended Setup
- Dataset: β
Downloaded (
data/ai_vs_human_text.csv) - Config: β
Updated (
configs/default.yaml) - Training: Use
run_train_simple.py
Start Training
python scripts/run_train_simple.py
Should see output like: ``` π Starting training (simple mode - no multiprocessing)
π Loading data from data/ai_vs_human_text.csv... Loaded 1,000 samples Distribution: {0: 493, 1: 507} Train: 800 | Val: 200
π€ Loading model: roberta-base...
π Creating datasets...
βοΈ Training for 2 epochs...
Good luck! π