基於AlphaZero作法之國際跳棋程式開發及研究
No Thumbnail Available
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
國際跳棋是由民族跳棋演變而來的。據說在一七二三年,居住在法國的一名波蘭軍官把六十四格的棋盤改為一百格,因此又被稱為「波蘭跳棋」。國際跳棋擁有flying king和連吃的特殊規則,使得下法有趣多變,深受大眾的喜愛。
近年來,AlphaZero演算法在多種棋類AI訓練上,都獲得極大的成功。因此,本研究使用AlphaZero的架構來實作國際跳棋的AI。然而,國際跳棋擁有連吃路徑的問題,無法以單次神經網路輸出來完整表達連吃的路徑,所以本研究設計連續走步,藉由神經網路的多次走步輸出來完整描述連吃的路徑。
為了提高國際跳棋AlphaZero的訓練效率,本研究使用大贏策略來加速訓練,讓神經網路能夠往大贏的方向去訓練。經過100迭代訓練之後,使用大贏策略訓練的神經網路模型與原始AlphaZero版本訓練的神經網路模型相比,擁有較高的勝率。
Draughts evolved from National Checkers. It is said that in 1723 a Polish military officer living in France changed the size of the board from sixty-four to a hundred. Therefore, it is also called "Polish Checkers". Draughts have special rules for flying king and continuous capturing, which makes it fun and changeful, and it is popular with the public. In recent years, AlphaZero algorithm has achieved great success in playing various games. Hence, this research uses AlphaZero's architecture to implement Draughts AI program. However, Draughts has the problem of continuous capturing path, so it is impossible to fully express the path of continuous capturing with a single neural network output. This study designs continuous moving, and uses the output of multiple moves of the neural network to completely describe the path of continuous capturing. In order to improve the training efficiency of the AlphaZero-based Draughts program, we apply the Big-Win strategy to speed up the training. It lets the neural network train at the direction of big wins. After 100 iterations of training, the network model trained using the Big-Win strategy has a higher winning rate than the network model trained with the original AlphaZero version.
Draughts evolved from National Checkers. It is said that in 1723 a Polish military officer living in France changed the size of the board from sixty-four to a hundred. Therefore, it is also called "Polish Checkers". Draughts have special rules for flying king and continuous capturing, which makes it fun and changeful, and it is popular with the public. In recent years, AlphaZero algorithm has achieved great success in playing various games. Hence, this research uses AlphaZero's architecture to implement Draughts AI program. However, Draughts has the problem of continuous capturing path, so it is impossible to fully express the path of continuous capturing with a single neural network output. This study designs continuous moving, and uses the output of multiple moves of the neural network to completely describe the path of continuous capturing. In order to improve the training efficiency of the AlphaZero-based Draughts program, we apply the Big-Win strategy to speed up the training. It lets the neural network train at the direction of big wins. After 100 iterations of training, the network model trained using the Big-Win strategy has a higher winning rate than the network model trained with the original AlphaZero version.
Description
Keywords
電腦對局, 國際跳棋, 神經網路, 深度學習, AlphaZero, computer game, draughts, neural network, deep learning, AlphaZero