Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

May 5, 2026 May 5, 2026

The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the seeds for the next round. Each cycle the model i...

artificial

Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

/u/gvij

May 5, 2026 May 5, 2026

Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

You can run the judge completely locally with Ollama if you do not want to send data to any API.

The fine-tuning at the end uses Unsloth on a free Colab GPU so the whole thing is doable without spending money.

It is more of a practical tool than a research project but the idea of using failure cases as curriculum is something I find genuinely interesting.

Would love to hear if anyone has done something similar.

Github project link is in comments below 👇

submitted by /u/gvij
[link] [comments]