Need help training a PPO NN to learn how to play my deckbuilding game

Hey, I have a roguelike deckbuilding game I want to train an agent to play using pure unsupervised RL; I chose PPO as I understand (to my amateur knowledge) that is the most fitting algorithm.

I have a very large categorical space that I have to send in (basically what cards are in the deck and which cards are being offered to pick), and I need the agent to learn the best picks.

I attempted to use an embedding layer and input the cards the player has + given cards + numerical data (concated with the embedding output). I tried playing around with various hyperparameters, but so far, I have not been able to generate any learning.

Any help or advice would be greatly appreciated, thanks!

submitted by /u/Jagerjj
[link] [comments]