I've decided to name this model

  • This model is dubbed SD15 Flow-Matching Sol - twin sister to the alternative Try2 who is named SD15 - Lune.

Sun and Moon.

Plan Update: 11/1/2025

image

I'm sticking to the positive spectrum here, knowing that 6 million samples isn't enough to converge sd15. I believe it will take around 10 mil to start SEEING correct shapes showing with texture other than flat or blob, but I've been wrong before - and we will make happy little bushes out of this if I am.

Our flow match troopers are trying their best, but the outlooks aren't looking particularly good yet. Blobs all the way to epoch 30. That's roughly 200,000 samples * 30, which should be about 6 million images worth. Not enough to fully saturate the system, but more than what I used for sdxl vpred conversions. There may need to be a refined process with synthetic dreambooth-styled images devoted to top prio, mid prio, and low prio classes.

When the distillation concludes, there will be additional finetuning after with direct images generated from sd15 using class-based specifics in any case. So, it'll be an interesting outcome for both the baseline starter and the v2 trained version. I have high hopes either way and I will have the class-based dreambooth-style selector ready to immediately begin after epoch 50.

Earlier updates

This is the config for the PT, you want the student unless you want to train it
I KNOW I KNOW, I'll get it worked out. For now this is every epoch 9+ if you see a PT for this particular model.

  "cfg": asdict(self.cfg),
  "student": self.student.state_dict(),
  "opt": self.opt.state_dict(),
  "sched": self.sched.state_dict(),
  "gstep": gstep

I started a second run. They are both running simultaneously. I want to see if by epoch 10 the new trainer produces a better epoch 10 than this one.

As of epoch 11 the blobs are reforming back into shapes, and the shapes are cohesing in fairly utilizable ways for the end product, they are still however - blobs for the time being.

image

image

image

This is a v-prediction flow-matching model that can be directly inferenced with euler-discreet flow-matching through diffusers, and I would advise doing this for testing purposes.

E11+ new expectations

It's training directly with timestep-awareness using shift and timestep association.

The least accurate timestep buckets have their opinions removed from the classifier weight, as the classifier cannot help if it cannot classify what the teacher itself is trying to say. Which lines up roughly with the 90/10 rule that David seems to cap at - which is about 90% accuracy and 10% incorrect. So about 10% of timestep buckets are inactive.

Individual block losses have been correctly reintroduced and will train the timestep and patterns HOPEFULLY correctly.

    # Timestep Weighting (David-guided adaptive sampling)
    use_timestep_weighting: bool = True
    use_david_weights: bool = True
    timestep_shift: float = 3.0  # SD3-style shift (higher = bias toward clean)
    base_jitter: int = 5  # Base ±jitter around bin center
    adaptive_chaos: bool = True  # Scale jitter by pattern difficulty
    profile_samples: int = 2500  # Samples to profile David's difficulty
    reliability_threshold: float = 0.15  # Minimum accuracy to trust David's guidance

Most original checkpoints are default sd15 after testing

For those who downloaded the models that either exhibit blobs or don't use flow matching noise - my sincerest apologies. They are defective. Blobs are expected, standard noise is not.

The CURRENT e8 has no clip or vae, so it's just sitting there standalone. This is the currently newest valid one and it functions as expected - by making blobs due to early pretraining.

I removed the faulty checkpoints and the correct checkpoints are the only ones remaining, which are early training and incompatible with correct inference, will be present.

Updated information

It's basically just blobs as expected, so don't expect much yet. It has a long way to go.

The current one is showing response to shift as it should. This will require an additional 40 or so epochs most likely before convergence is possible and I will be uploading every epoch as pt from this point forward to guarantee cohesion and transparency. So about, 4 days. Give or take. Not too bad, all things considered. So lets hope it actually works out huh. If not, I'll just train it directly using a different technique without David.

My sincerest apologies for all of the blunders and the problems. I didn't expect so many problems but I did expect some.

I ended up having to use debug to salvage epoch 8 so I wouldn't have to restart. The metrics appear corrupted as well. The safetensor outputs were saving the original sd15 with silently mismatched keys thanks to the diffusers script not operating as intended. Additionally, the subsystems that I implemented never tripped the flags they needed to - in order to ensure backup. So the system was culling the PTs. Between a rock and a hard place I figured out how to salvage it and here we are - thanks to a combination of Gemini's information and Claude's code debugging and problem solving the training can continue.

More faults more problems still managed to salvage the real one

How absurd and difficult anything SD15 has been to debug.

Okay I am correctly converting the valuation and can now properly test the unet for diffusion testing.

CKPT Bumbles

Apologies, the safetensors ARE ComfyUI formatted... CKPT. :V

Rename the extension to ckpt, because it's clearly incorrect. I'll convert them asap, my apologies for not micro-managing more closely.

I renamed epoch 4 in the repo.

Training continues.

Trainer updated aka trainer.py with the updated version that handles checkpoint saving and loading using the correct ComfyUI script.

It'll automatically load and run in colab, additionally it's prepared to continue training from the most recent checkpoint here. You can point it at your own repo to load and save.

SD1.5 Flow-Matching Distillation with Geometric Guidance (EXPERIMENTAL)

The day disappeared on me and the trainer stayed down most of the day because I was working on other tasks.

Colab randomly died last night after epoch 7, and I found out the thing wasn't uploading so now I'm just going to have to restart from epoch 3 - the version I manually uploaded before I went to bed. They were supposed to be pooled in a private repo but they weren't.

⚠️ Experimental Research

Status: Training in progress | No guarantees of convergence or quality

This is an experimental approach to distilling Stable Diffusion 1.5 using flow matching with geometric guidance from GeoDavidCollective. Results are not yet validated.

Overview

This trainer attempts to distill Stable Diffusion 1.5 using v-prediction flow matching with adaptive per-block weighting based on geometric quality assessment. Unlike traditional distillation that treats all UNet blocks equally, this approach uses a pre-trained geometric model (David) to evaluate student features and dynamically adjust training emphasis per block.

Hypothesis: Geometric guidance may help the student learn SD1.5's internal structure more effectively by:

  • Identifying which blocks are learning poorly
  • Applying stronger supervision where needed
  • Maintaining geometric stability during training

Status: Hypothesis untested. Requires ablation study comparing David-guided vs. vanilla flow matching.

Architecture

Three-Component System

Teacher (SD1.5 UNet, frozen, FP16)
  ↓ provides ε* → v* targets + features
  
Student (Trainable UNet, FP16)
  ↓ predicts v̂ + features
  
Flow Matching Loss: MSE(v̂, v*)

+

David Assessor (GeoDavidCollective, frozen, 872M params)
  ↓ evaluates student features per block
  ↓ outputs: e_t (timestep error), e_p (pattern entropy), coh (coherence)
  
Fusion System: λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))
  ↓ converts metrics to per-block penalties
  
Block Losses: Σ λ_b · (KD loss per block)

Total: L_flow + block_weight · L_blocks

Components

Teacher: SD1.5 UNet (frozen, FP16)

  • Provides ground truth for flow matching
  • Extracts spatial features per block

Student: Trainable UNet (FP16)

  • Initialized from teacher weights
  • Learns v-prediction objective
  • Features assessed by David

David: GeoDavidCollective (frozen)

  • Pre-trained geometric model
  • Evaluates feature quality per block
  • Provides adaptive weighting signals

Fusion: Dynamic penalty calculator

  • λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))
  • Bounded: [0.5, 3.0]
  • Higher λ = more training emphasis

Training Configuration

Dataset

Source: SymbolicPromptDataset (synthetic prompts)
Samples: 200,000
Batch Size: 64
Epochs: 10
Workers: 2

Optimization

Optimizer: AdamW
Learning Rate: 1e-4
Weight Decay: 1e-3
Scheduler: CosineAnnealingLR
Gradient Clipping: 1.0
Mixed Precision: Enabled (FP16)

Loss Weights

Global Flow Weight: 1.0
Block Penalty Weight: 0.05  # Critical hyperparameter!
KD Weight: 0.25 (cosine similarity on pooled features)
Local Flow Heads: Disabled

David Fusion

Base Block Weights:
  down_0: 0.7, down_1: 0.9, down_2: 1.0, down_3: 1.1
  mid: 1.2, up_0: 1.1, up_1: 1.0, up_2: 0.9, up_3: 0.7

Fusion Coefficients:
  alpha (timestep): 0.5
  beta (pattern): 0.25
  delta (incoherence): 0.25

Lambda Bounds: [0.5, 3.0]

Training Progress (Epoch 1/10)

Current Metrics

L_total: 0.24
L_flow: 0.23
L_blocks: 0.07
Speed: ~1.5 it/s (A100)

Interpretation:

  • Block losses balanced after fixing block_penalty_weight
  • Flow loss converging as expected
  • No evidence of collapse or divergence yet

Expected Timeline (Unvalidated)

Epoch 1-2: Loss stabilization
Epoch 3-5: Feature structure learning (images may be blurry)
Epoch 8-10: Potential convergence (quality unknown)

Note: No baseline comparison yet. Cannot claim faster/better convergence without ablation study.

Model Files

Training saves checkpoints as:

checkpoints/
├── checkpoint_epoch_002.safetensors
├── checkpoint_epoch_004.safetensors
└── final.safetensors

Each checkpoint contains student UNet weights only.

Inference

Model can be sampled using standard diffusion samplers (DDPM, DDIM) with v-prediction:

# Pseudocode - implementation details TBD
x_t = noise
for t in reversed(timesteps):
    v = student_unet(x_t, t, text_embeddings)
    x_t = step(x_t, v, t)  # v-prediction update
image = vae.decode(x_t)

Requires SD1.5 VAE and text encoder (not included in checkpoint).

Known Issues

  • ❓ No proof this approach works better than vanilla distillation
  • ❓ Optimal block_penalty_weight unknown (currently 0.05)
  • ❓ May require tuning lambda bounds for different datasets
  • ❓ Inference quality unvalidated

Future Work

Required Validation

  1. Ablation Study: Train identical model WITHOUT David guidance
  2. Quality Metrics: FID, CLIP score vs. SD1.5 baseline
  3. Convergence Analysis: Compare learning curves
  4. Inference Testing: Visual quality assessment

Potential Improvements

  • Adaptive block_penalty_weight scheduling
  • Per-block learning rates
  • David warmup strategy
  • Better fusion formulas

Experimental Design

Hypothesis

Geometric guidance from David will improve distillation by:

  1. Identifying poorly-learning blocks
  2. Applying adaptive supervision
  3. Maintaining feature geometry

Test Plan

Control: SD1.5 flow matching (no David)
Treatment: SD1.5 flow matching + David guidance
Metrics: Loss curves, FID, CLIP score, visual quality

Success Criteria

  • Faster convergence (fewer epochs to target loss)
  • Better final quality (lower FID)
  • More stable training (less variance)

Status: Experiment in progress, no results yet.

Technical Details

David Assessment

Per block, David outputs:

  • e_t: Cross-entropy on timestep classification (proxy for temporal understanding)
  • e_p: Entropy on pattern classification (proxy for feature diversity)
  • coh: Cantor alpha (geometric coherence metric)

These convert to penalty multipliers via fusion formula.

Flow Matching

v-prediction objective:

v* = α · ε - σ · x₀  (target)
v̂ = student(x_t, t)  (prediction)
L_flow = MSE(v̂, v*)

Where α, σ from noise schedule.

Per-Block KD

Cosine similarity on spatial-pooled features:

L_kd = 1 - cosine_sim(
    student_features.mean(spatial), 
    teacher_features.mean(spatial)
)

Dependencies

torch >= 2.0
diffusers >= 0.21
transformers >= 4.30
safetensors >= 0.3
huggingface_hub >= 0.16

Plus custom repo: geovocab2 (for David model and data synthesis)

Hardware Requirements

  • Training: A100 40GB (FP16 mixed precision)
  • Inference: RTX 3090 / A6000 (24GB)
  • Storage: ~10GB for checkpoints + logs

Reproducibility

Training is deterministic with fixed seed (42), but:

  • Depends on David checkpoint version
  • May be sensitive to hardware (GPU type)
  • Synthetic data generation has randomness

Limitations

  1. Untested: No validation that this works
  2. SD1.5 Only: Hardcoded for SD1.5 architecture
  3. David Dependency: Requires specific pre-trained model
  4. Synthetic Data: Trained on generated prompts, not real captions
  5. No Safety: Inherits SD1.5 biases, no content filtering

Ethical Considerations

  • Inherits biases from SD1.5 training data
  • No additional safety measures implemented
  • Should not be deployed without content filtering
  • Research purposes only

Citation

@software{sd15flowmatch2025,
  author = {AbstractPhil},
  title = {SD1.5 Flow-Matching with Geometric Guidance (Experimental)},
  year = {2025},
  url = {https://huggingface.co/AbstractPhil/sd15-flow-matching},
  note = {Experimental distillation approach, results unvalidated}
}

License

MIT License

Related Work


Current Status: 🧪 Experimental training in progress

Do not use for production - validation pending

Downloads last month
2,770
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbstractPhil/sd15-flow-matching

Finetuned
(606)
this model