Based on a bit of research and a lot of gut feeling, I offer the following speculation:
- if you self-trained an LLM with a Python interpreter or Java compiler in a feedback loop where it learned from its own mistakes then it could become dramatically better at coding. It's actually a miracle that they are "decent" at coding despite getting virtually no feedback from an interpreter or compiler.
- one could train not merely on input and output, but also on an execution trace so the LLM learned HOW the interpreter got the result
- one could also train the model on how to install and invoke open source software and thus it would learn about a variety of languages, versions and runtimes
- this might also improve its logical reasoning skills in general
Admittedly, running programs is a lot more expensive than doing simple next-word prediction on pre-existing texts.
But on the other hand, a corpus of a million program executions can also be used to train future LLMs. You can keep the execution information forever and re-use it as traditional next-token prediction input.
[link] [comments]