-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervised learning and samples #140
Comments
Doing so is not my priority right now but I would be happy to welcome contributions here.
See |
Thanks! |
Previous versions will still be accessible from git or the package manager. |
What's the main reason of the rewriting of AlphaZero.jl? |
Thanks! |
Could you please add a supervised learning feature in the next release so we can insert human-played games instead of self-play games. |
I will keep this is mind, although I cannot make any promise right now. |
Please! We really need it. Your AlphaZero.jl is a WONDERFUL project. I must say you're a genius. |
Can you tell me more about how you or your company are using AlphaZero.jl and for what game/environment? |
It's a non-commercial, educational project. I teach kids to play a board game (of a mancala family). We don't have a good software, so sometimes we don't even know where a player made a mistake. With AlphaZero.jl I created a bot that plays pretty strong, and "explore" function gives us an idea which moves are good and which are bad. |
Thanks for the testimony. It is great to hear that AlphaZero.jl is being used successfully in an educational project. |
Bot plays pretty strong, but still leaves much to be desired. So I was thinking about a supervised learning. I have thousands of games played by masters.
How can I define these values? All I have is thousands of games with moves and result. They weren't played using MCTS, so I don't know π, etc. values. Speaking frankly, I'm very confused. |
I'm not a professional Julia programmer. I had to learn Julia to create a bot based on AlphaZero.jl |
First of all, a word of warning. I understand that you are not a trained programmer and it is all the nicer for me to learn that you were still able to use this package on your own game. That being said, an algorithm such as AlphaZero can hardly be used as a black box and the moment you try and do something a bit unusual, there is no escaping from understanding the codebase and the underlying algorithm. In the long run, you may want to take the time to improve your Julia skills, read a bit about machine-learning and AlphaZero and then try and understand the codebase as a whole. Regarding your current question, if you have a database of games played by humans, you can extract samples from it in the following way. In state |
The idea was said by Jonathan:
"I guess what you've have to do is generate many samples of the kind that are stored in AlphaZero's memory buffer. You can take these samples either from human play data or have other players play against each other to generate data. If you do so, be careful to add some exploration so that the same game is not played again and again and that you get some diversity in your data. Once you've got the data, you can either use the Trainer utility in learning.jl or just write your training procedure yourself in Flux."
Did anyone implement it? I still don't understand, in which format the games and moves are stored in memory buffer.
The text was updated successfully, but these errors were encountered: