GTPO: a more stable alternative to GRPO for LLM training
Paper, GitHub, Colab GRPO has some key issues: Tokens show up in both positive and negative completions, which leads to conflicting updates that break structure.Negative completions push the model toward unlikely tokens, flattening the distribut…