中國跳棋對局程式研發與深度學習之探討

林順喜陳俊豪Chern, Jiunn-Haur2019-09-052021-02-272019-09-052019http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G060547026S%22.&%22.id.&http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106490中國跳棋遊戲是家喻戶曉的棋盤遊戲，但針對提升電腦對局棋力的研究並不多，過去以蒙地卡羅樹搜索法作為兩人中國跳棋AI的主要演算法，已經能表現出一定的棋力，但還是有改進的空間。中國跳棋的遊戲目標在於將己方的所有棋子前進至目的地，除了要使棋子能夠快速前進外，也要在適當的時機後退，取得攻防之間的平衡。本研究針對兩人中國跳棋遊戲的AI做改良，加入深度學習的做法，主體採用 AlphaZero 的框架來訓練類神經網路。為了在有限的硬體資源及時間下取得效果，嘗試加入針對遊戲特性的改進。先使用蒙地卡羅樹搜索法搭配隨機模擬，產生多種開局的棋譜作為預先訓練模組的訓練資料，再用此模組做後續自我對弈的學習，可避免一開始脆弱的神經網路無法結束遊戲。遊戲後期則使用單人遊戲的搜索法，以改善後期已知必勝或必敗盤面時，不會挑選最佳走步的問題。Chinese Checkers is a well-known board game, but it has received little research attention in efforts to improve the strength of a program. In the past, Monte Carlo Tree Search is an effective solution for two players chinese checkers, but it leaves much to be desired. The aim of the game is to be first to move all pieces into the opposite corner. In addition to making pieces forward fast, it is necessary to back off the pieces at appropriate time to achieve the balance between attack and defense. This thesis considers two-player case, and tries to improve Chinese Checkers program by convolution neural network. The new program is based on AlphaZero's framework to train neural network. In order to achieve results with limited hardware resources and time, we apply some improved strategies related to the game's characteristics. In order to avoid being unable to end the game, first, we create a pre-trained model gained from random playout results. Then, we train the neural network by a self-play reinforcement learning algorithm. Finally, we use the single-player MCTS at the latter stages of the game to prevent the situation that the obvious winner will not always select the best move.電腦對局中國跳棋蒙地卡羅樹搜索法AlphaZerocomputer gamesChinese CheckersMonte Carlo Tree SearchAlphaZero中國跳棋對局程式研發與深度學習之探討The Research of Chinese Checkers Program with Deep Learning