Wizards and PPO
Wizards and PPO

Wizards and PPO

Hello

I am u/nurgle100 and I have been working on and off on a Deep Reinforcement Learning Project [GitHub] for the last five years now. Unfortunately I have hit a wall. Therefore I am posting here to show my progress and to see if any of you are interested in taking a look at it, giving some suggestions or even in cooperating with me.

The idea is very simple. I wanted to code an agent for Wizard) the card game. If you have never heard of the game before: It is - in a nutshell- a trick-taking card game where you have to announce the amount of tricks that you win each round and gain points if you get this exact amount of tricks but lose points otherwise.

Unfortunately I have not yet succeeded at making the computer play well enough to beat my friends, but here is what I have done so far:

I have implemented the game in python as a gymnasium environment as well as a number of algorithms that I thought would be interesting to try. The current approach is to run the Stable Baselines 3 implementation of a Proximal Policy Optimization Algorithm and have it play first against randomly acting adversaries and then have it play against other versions of itself. In theory, training would go on until the trained agent surpasses human level of play.

So now about the wall that I have been hitting:

Because Deep Reinforcement Learning -and PPO is no exception here- is incredibly resource and time consuming, training these agents has turned out to be quite a challenge. I have run it on my GeForce RTX 3070 for a month and a half without achieving the desired results. The trained agent shows consistent improvement but not enough to ever compete with an experienced human player.

It's possible that an agent trained with PPO as I have been doing it, is not capable of achieving better-that-human performance in Wizards.

But there is a number of things that I have thought of that could still bring some hope:

- Pre-Training the Agent on human data. Possible but I haven't looked into where I could acquire data like this.

- There might be a better way to pass information from the environment to the agent. This might be a bit harder to explain so I'll elaborate when I write a more detailed post.

- Actual literature research - I have not seriously looked into machine learning literature on trick-taking card games so there might be some helpful publications on this topic.

If you are interested in the code or the project and have trouble installing it I would be happy to help!

- Its a good way to make the install guide more inclusive.

submitted by /u/nurgle100
[link] [comments]