qualitative completions

Multilingual-Dolci-SFTed Olmo3 Checkpoint Viewer

Side-by-side completions from the dolci_translated SFT experiment, sampled stratified-by-category from LMArena across 7 EU languages (cs, de, es, fi, fr, it, sv). Slide through training progress to see how the SFT runs evolve checkpoint by checkpoint; the pre-SFT base and SFT baseline are static references. Filter by LMArena category (math, creativity, instruction-following, …) to focus on a behaviour.

Note: A-75en converged in 3998 steps and A-25en in 5398 steps even though both runs processed the same 2.87M samples × 2 epochs at the same batch size. Translated text packs into more tokens per sample → more steps to cover the same data. Tick 4 (final) is the only slider position where both runs are at 100% trained; intermediate ticks compare the literal step numbers and so are at slightly different training fractions.

Loading…

Language

Prompt

Training progress (A-75en / A-25en)

early final

Show / hide

Filter prompts by category

prompts are multi-tagged — most hit several categories

Prompt