MuZero 演算法結合連續獲勝走步改良外圍開局五子棋程式
No Thumbnail Available
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
2019年,DeepMind所開發的MuZero演算法使用「零知識」學習,將人工智慧帶往更加通用的研究領域。由於以此演算法所開發的Muzero-general原始版本外五棋程式,其模型訓練時只估計遊戲的結束狀態,增添了許多訓練時的不確定性,於是本研究嘗試以連續獲勝走步改良此外五棋程式。迫著走步是外五棋遊戲當中非常重要的獲勝手段,連續獲勝走步則是在正確使用迫著走步後,所得出的獲勝走步。本研究透過連續獲勝走步原則,進一步以對局過程中是否有提供以迫著搜索得出之連續獲勝走步,以及不同的迫著搜索設計結合不同情況的連續獲勝走步獎勵,設計了三種不同的改良方法。實驗結果表明,在相同的訓練時間下,三種方法均成功對原始版本進行改良,其中採用加入主動進攻走步之迫著搜索設計為棋力最強的方法。關鍵詞 : MuZero、神經網路、迫著搜索、連續獲勝走步
In 2019, the MuZero algorithm developed by DeepMind used"no knowledge" learning to bring artificial intelligence to a more general research field. Since the original version of Muzero-general developed by this algorithm only estimates the ending state of the game during training, it adds a lot of uncertainty during training, so this study attempts to improve the Outer-Open Gomoku program with consecutive winning moves. Using threat moves is a very important way to win in the game of Outer-Open Gomoku, and the consecutive winning moves are the winningmoves obtained from the correct use of the threat moves. Through combining MuZero Algorithm with consecutive winning moves , this study further designs three different methods.The experimental results show that, under the same training time, the three methods have all successfully improved the original version. Among them, the second one that the threat moves include the active offensive moves is the most powerful method.Keywords: MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves
In 2019, the MuZero algorithm developed by DeepMind used"no knowledge" learning to bring artificial intelligence to a more general research field. Since the original version of Muzero-general developed by this algorithm only estimates the ending state of the game during training, it adds a lot of uncertainty during training, so this study attempts to improve the Outer-Open Gomoku program with consecutive winning moves. Using threat moves is a very important way to win in the game of Outer-Open Gomoku, and the consecutive winning moves are the winningmoves obtained from the correct use of the threat moves. Through combining MuZero Algorithm with consecutive winning moves , this study further designs three different methods.The experimental results show that, under the same training time, the three methods have all successfully improved the original version. Among them, the second one that the threat moves include the active offensive moves is the most powerful method.Keywords: MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves
Description
Keywords
MuZero, 神經網路, 迫著搜索, 連續獲勝走步, MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves