Hey, I have a roguelike deckbuilding game I want to train an agent to play using pure unsupervised RL; I chose PPO as I understand (to my amateur knowledge) that is the most fitting algorithm.
I have a very large categorical space that I have to send in (basically what cards are in the deck and which cards are being offered to pick), and I need the agent to learn the best picks.
I attempted to use an embedding layer and input the cards the player has + given cards + numerical data (concated with the embedding output). I tried playing around with various hyperparameters, but so far, I have not been able to generate any learning.
Any help or advice would be greatly appreciated, thanks!
[link] [comments]