在下棋與訓練階段改進AlphaZero演算法

林順喜Lin, Shun-Shii陳志宏Chen, Chih-Hung2022-06-082024-10-012022-06-082021https://etds.lib.ntnu.edu.tw/thesis/detail/965feca0896757a0aadfca6515239c49/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117352noneAlphaZero got grand success across many challenging games, but it needs a huge computational power to train a good model. Instead of investing so many resources, we focus on improving the performance of AlphaZero. In this work, we introduce seven major enhancements in AlphaZero. First, the AlphaZero-miniMax Hybrids strategy combines the modern AlphaZero approach and traditional search algorithm to improve the strength of the program. Second, the Proven-mark strategy prunes unneeded moves to avoid the re-sampling problem and increase the opportunity of exploring the promising moves. Third, the Quick-win strategy distinguishes the rewards according to the length of the game-tree search, and no longer treats all wins (or losses) equally. Fourth, the Best-win strategy resolves an inaccurate win rate problem by updating the best reward rather than average. Fifth, the Threat-space-reduction improves the performance of the neural network training under limited resources. Sixth, the Big-win strategy takes into consideration the number of points of the final outcome instead of simply labeling win/loss/draw. Finally, the Multistage-training strategy improves the quality of the neural network for multistage games. After years of work, we derive some promising results that have already improved the performance of the AlphaZero algorithm on some test domains.noneAlphaZero-miniMax HybridsProven-mark strategyQuick-win strategyBest-win strategyThreat-space-reductionBig-win strategyMultistage-training strategy在下棋與訓練階段改進AlphaZero演算法Improving the AlphaZero Algorithm in the Playing and Training Phases學術論文