Made a tool that builds its own training data and improves each cycle by learning from what it got wrong
The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the seeds for the next round. Each cycle the model i…