final
Model Description
This model is a fine-tuned version of Qwen/Qwen3-4B-Base using the brute-force-training package.
- Base Model: Qwen/Qwen3-4B-Base
 - Training Status: โ Complete
 - Generated: 2025-08-14 08:18:27
 - Training Steps: 100,000
 
Training Details
Dataset
- Dataset: yale-cultural-heritage/linkedart-synthetic-art-non-llm
 - Training Examples: 75,000
 - Validation Examples: 4,999
 
Training Configuration
- Max Steps: 100,000
 - Batch Size: 3
 - Learning Rate: 1e-05
 - Gradient Accumulation: 2 steps
 - Evaluation Frequency: Every 10,000 steps
 
Current Performance
- Training Loss: 0.015216
 - Evaluation Loss: 0.016040
 
Pre-Training Evaluation
Initial Model Performance (before training):
- Loss: 0.740330
 - Perplexity: 2.10
 - Character Accuracy: 2.9%
 - Word Accuracy: 1.8%
 
Evaluation History
All Checkpoint Evaluations
| Step | Checkpoint Type | Loss | Perplexity | Char Acc | Word Acc | Improvement vs Pre | 
|---|---|---|---|---|---|---|
| Pre | pre_training | 0.7403 | 2.10 | 2.9% | 1.8% | +0.0% | 
| 10,000 | checkpoint | 0.0264 | 1.03 | 1.9% | 2.2% | +96.4% | 
| 20,000 | checkpoint | 0.0220 | 1.02 | 2.0% | 1.7% | +97.0% | 
| 30,000 | checkpoint | 0.0204 | 1.02 | 1.9% | 1.5% | +97.2% | 
| 40,000 | checkpoint | 0.0191 | 1.02 | 1.8% | 1.8% | +97.4% | 
| 50,000 | checkpoint | 0.0180 | 1.02 | 1.8% | 1.5% | +97.6% | 
| 60,000 | checkpoint | 0.0177 | 1.02 | 2.5% | 3.0% | +97.6% | 
| 70,000 | checkpoint | 0.0173 | 1.02 | 2.0% | 2.4% | +97.7% | 
| 80,000 | checkpoint | 0.0166 | 1.02 | 2.2% | 2.8% | +97.8% | 
| 90,000 | checkpoint | 0.0164 | 1.02 | 2.5% | 3.0% | +97.8% | 
| 100,000 | final | 0.0160 | 1.02 | 2.4% | 2.7% | +97.8% | 
Training Progress
Recent Training Steps (Loss Only)
| Step | Training Loss | Timestamp | 
|---|---|---|
| 99,991 | 0.010545 | 2025-08-14T08:16 | 
| 99,992 | 0.011361 | 2025-08-14T08:16 | 
| 99,993 | 0.015713 | 2025-08-14T08:16 | 
| 99,994 | 0.008295 | 2025-08-14T08:16 | 
| 99,995 | 0.032396 | 2025-08-14T08:16 | 
| 99,996 | 0.020433 | 2025-08-14T08:16 | 
| 99,997 | 0.010563 | 2025-08-14T08:16 | 
| 99,998 | 0.011059 | 2025-08-14T08:16 | 
| 99,999 | 0.024445 | 2025-08-14T08:16 | 
| 100,000 | 0.015216 | 2025-08-14T08:16 | 
Training Visualizations
Training Progress and Evaluation Metrics
This chart shows the training loss progression, character accuracy, word accuracy, and perplexity over time. Red dots indicate evaluation checkpoints.
Evaluation Comparison Across All Checkpoints
Comprehensive comparison of all evaluation metrics across training checkpoints. Red=Pre-training, Blue=Checkpoints, Green=Final.
Available Visualization Files:
training_curves.png- 4-panel view: Training loss with eval points, Character accuracy, Word accuracy, Perplexityevaluation_comparison.png- 4-panel comparison: Loss, Character accuracy, Word accuracy, Perplexity across all checkpoints
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# For vision-language models, use appropriate imports
model = AutoModelForCausalLM.from_pretrained("./final")
tokenizer = AutoTokenizer.from_pretrained("./final")
# Your inference code here
Training Configuration
{
  "dataset_name": "yale-cultural-heritage/linkedart-synthetic-art-non-llm",
  "model_name": "Qwen/Qwen3-4B-Base",
  "max_steps": 100000,
  "eval_steps": 10000,
  "num_accumulation_steps": 2,
  "learning_rate": 1e-05,
  "train_batch_size": 3,
  "val_batch_size": 3,
  "train_select_start": 0,
  "train_select_end": 75000,
  "val_select_start": 75001,
  "val_select_end": 80000,
  "train_field": "train",
  "val_field": "train",
  "input_column": "input",
  "output_column": "output",
  "system_prompt": "You are a helpful assistant who converts input data into LinkedArt JSON.",
  "validation_samples": 500
}
Model Card Metadata
- Base Model: Qwen/Qwen3-4B-Base
 - Training Framework: brute-force-training
 - Training Type: Fine-tuning
 - License: Inherited from base model
 - Language: Inherited from base model
 
This model card was automatically generated by brute-force-training on 2025-08-14 08:18:27
- Downloads last month
 - -
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support
Model tree for yale-cultural-heritage/qwen3-4B-LinkedArt-100k
Base model
Qwen/Qwen3-4B-Base
