| Hey everyone, I wrote a hands-on guide for fine-tuning LLMs with GRPO (Group-Relative PPO) locally on Windows, using Hugging Face's TRL library. My goal was to create a practical workflow that doesn't require Colab or Linux. The guide and the accompanying script focus on:
This is for anyone looking to experiment with reinforcement learning techniques on their own machine. Read the blog post: I'm open to any feedback. Thanks! P.S. I'm currently looking for my next role in the LLM / Computer Vision space and would love to connect about any opportunities Portfolio: Pavan Kunchala - AI Engineer & Full-Stack Developer. [link] [comments] |