基於強化學習之Surakarta棋程式開發與研究

No Thumbnail Available

Date

2019

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Surakarta棋是起源於印尼爪哇島的一種雙人零和遊戲,原名Permainan,在印尼文是遊戲之意,後來由法國人命名為Surakarta,取自當地地名「梭羅」。遊戲中獨一無二的吃子方法是這種棋的最大亮點,透過棋盤外圍的環狀構造,將對手的棋子一網打盡後,方可獲得最後的勝利。 除了現實的遊戲外,Surakarta棋也是Computer Olympiad定期舉辦的比賽項目之一,歷年來誕生了不少棋力高強的程式。而這兩年的AlphaGo和AlphaZero將電腦對局推向了新的里程碑,也有了新的契機,希望能夠將Surakarta棋程式的棋力向上提升。 本研究將利用AlphaZero的架構,搭配不同的參數及架構上的改良,訓練及實做Surakarta棋的AI和視覺化平台。除了單一神經網路的版本,研究中也嘗試了一種新的多神經網路架構,將遊戲的過程分成三階段並訓練三種不同的神經網路來各司其職,分別為「開局網路」、「中局網路」和「殘局網路」。其中,使用殘局網路版本的AlphaZero算法和DTC殘局庫做了交叉驗證,顯示其正確率高達99%。
Surakarta is an Indonesian zero-sum board game for two players. The original name of the game is Permainan, which means "the game" in Bahasa Indonesia. It was named after the ancient city of Surakarta in central Java. The unique method of capturing pieces in the game is the biggest highlight of this kind of thing. Through the inner or outer circuits around the board, a player needs to capture all the opponent's pieces to get the final victory. In addition to the human-playing purpose, Surakarta is also one of the regular events organized by the Computer Olympiad. Over the years, many strong programs have been developed and conducted. In the past two years, AlphaGo and AlphaZero have pushed the computer games to a new milestone, and there is a new opportunity to promote the level of the Surakarta program. This study will use AlphaZero architecture, with different parameters and architectural improvements, to train the AI engine. We also implement the visualization platform of Surakarta. In addition to the original single-neural network version, the research also tries to use a new multi-neural network architecture, which divides the game process into three phases and trains three different neural networks to perform their respective functions, namely "Opening Network", "Middle Network", and "Ending Network". Among them, the cross-validation is performed using the AlphaZero algorithm on the Ending Network version and the DTC endgame tablebase. It shows that the correct rate of the former one is as high as 99% compared to the DTC endgame tablebase.

Description

Keywords

電腦對局, Surakarta棋, AlphaZero, 神經網路, 深度學習, computer games, Surakarta, AlphaZero, neural network, deep learning

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By