qualitative completions

Multilingual-Dolci-SFTed Olmo3 Checkpoint Viewer

Side-by-side completions from the dolci_translated SFT experiment, sampled stratified-by-category from LMArena across 7 EU languages (cs, de, es, fi, fr, it, sv). Slide through training progress to see how the SFT runs evolve checkpoint by checkpoint; the pre-SFT base and SFT baseline are static references. Filter by LMArena category (math, creativity, instruction-following, …) to focus on a behaviour.

Note: A-75en converged in 3998 steps and A-25en in 5398 steps even though both runs processed the same 2.87M samples × 2 epochs at the same batch size. Translated text packs into more tokens per sample → more steps to cover the same data. Tick 4 (final) is the only slider position where both runs are at 100% trained; intermediate ticks compare the literal step numbers and so are at slightly different training fractions.

Loading…

Language
Prompt
Training progress (A-75en / A-25en)
early final
Show / hide
Filter prompts by category
prompts are multi-tagged — most hit several categories
Prompt