Teaching LLMs to be more reasonable

Based on a bit of research and a lot of gut feeling, I offer the following speculation:

if you self-trained an LLM with a Python interpreter or Java compiler in a feedback loop where it learned from its own mistakes then it could become dramatically better at coding. It's actually a miracle that they are "decent" at coding despite getting virtually no feedback from an interpreter or compiler.
one could train not merely on input and output, but also on an execution trace so the LLM learned HOW the interpreter got the result
one could also train the model on how to install and invoke open source software and thus it would learn about a variety of languages, versions and runtimes
this might also improve its logical reasoning skills in general

Admittedly, running programs is a lot more expensive than doing simple next-word prediction on pre-existing texts.

But on the other hand, a corpus of a million program executions can also be used to train future LLMs. You can keep the execution information forever and re-use it as traditional next-token prediction input.

submitted by /u/Smallpaul
[link] [comments]