With my team we're currently trying to reproduce o1 series reasoning capabilities. However, we'd need a little help from the community to obtain more data. We plan to base our research on top of two OpenAI's papers: Let's Verify Step by Step (https://arxiv.org/pdf/2305.20050) and Prover-Verifier Games improve legibility of LLM outputs (https://arxiv.org/pdf/2407.13692). We will probably also utilize some type of tree search in our approach. As we are a quite small team, any help would be very beneficial, especially with obtaining math, reasoning and code Chain of Thought data with steps taken classified as "correct", "neutral" or "incorrect". If you're interested in helping us, please comment under this post or send me a message on reddit or discord (danfosing).
Yes the entirety of our research including models, dataset, code used to train will be open sourced.
[link] [comments]