林順喜Lin, Shun-Shii陳品源Chen, Pin-Yuan2020-10-192025-03-022020-10-192020http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G060647082S%22.&%22.id.&http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/1117212016年3月,DeepMind的AlphaGo程式以4:1的結果擊敗了當時韓國職業圍棋9段棋士李世乭,讓電腦對局的AI程式在強化學習的路上取得了巨大的突破與成就。隨後2017年10月更提出了AlphaGo Zero方法,以100:0的比數戰勝了原本的AlphaGo Lee程式,也證明了不用人類的棋譜當作先驗知識,就可以訓練出比人類還要更強的圍棋程式。而DeepMind最終把AlphaGo Zero方法一般化成了AlphaZero方法,也訓練出了當今世界棋力最強的西洋棋與將棋程式。但相對的,DeepMind也運用了非常龐大的運算資源來訓練,才得到了最強的棋力。 本論文所研究的棋類為1970年楠本茂信所發明的5五將棋,5五將棋是一種將棋變體,特色是棋盤大小比本將棋還要小,只有5×5的盤面,將棋則有9×9,所以5五將棋是很適合一般人在硬體資源有限的情況下,來實作電腦對局的AI程式項目。 本實驗是使用AlphaZero的演算法,搭配AlphaZero General框架來實作出使用神經網路搭配強化學習來訓練的AI程式,而我們也搭配了一些已知的優勢策略做改良,讓我們可以在有限的硬體資源下,增進神經網路模型的訓練效率。 在5五將棋的訓練中,我們使用兩種方法去做改良,第一種方法是依盤面的重要性對樣本做採樣,設定中局會比終盤與開局還要高的採樣機率,期待能讓神經網路學習下中盤棋局時能比一般的版本下的更好。 第二種方式是用能贏直接贏的方式去訓練,藉由提前一回合看到終局盤面,來達到Winning Attack的效果,因為MCTS在下棋時,即便是遇到能分出勝負的走步,不一定會走出能分出勝負的那一步,導致神經網路權重會收斂的很慢,而藉由此方法,可以比一般的訓練方法還要快的收斂。 本研究所採用的兩個方法是一個成功一個失敗的結果,以實驗數據來說,如果取樣取的好,是有機會提升棋力的,但數據的表現上除了一組數據外,其他數據皆不盡理想;而Winning Attack的棋力提升的數據就非常顯著了,不過兩種方法搭配起來一起訓練時,雖然也會提升棋力,但是兩個方法沒有互相加成的效果。In March 2016, DeepMind's AlphaGo program defeated the Korean 9-dan professional Go player Lee Se-Dol with a 4:1 result, promoting the computer game's AI to make a huge breakthrough and achievement on the field of reinforcement learning. DeepMind also proposed the AlphaGo Zero method, which defeated the original AlphaGo Lee program with a score of 100:0, and also proved that without the human playing record as prior knowledge, we can also train a stronger Go program better than humans. DeepMind finally generalized the AlphaGo Zero method to the AlphaZero method, making their programs become the most powerful one in the world today. However, DeepMind also used very huge computing resources to get the strongest strength. The game studied in this thesis is MiniShogi invented by Nanben Maoxin in 1970. MiniShogi is a variant of Shogi with the characteristic that the board size is smaller, a 5×5 board. Shogi has a 9×9 size, so MiniShogi is very suitable for ordinary people to implement their AI programs with limited hardware resources. Our experiment uses the AlphaZero General framework to implement an AI program trained on neural network by reinforcement learning. We also use some known advantageous strategies to improve its performance. In the training of the MiniShogi program, we used two methods. The first is to select the samples according to the importance of the board. We set the sampling probability of the middle stages’ games to be higher than the final and the opening stages’ games in order to let the neural network learn better than the original version when playing the middle stages’ games. The second way is to use the Winning Attack training method. By looking ahead of the final result one round in advance, it may achieve the effect of "winning directly". We observed that when MCTS plays games, even if it encounters a move that can distinguish the winner and the loser, it is unable to take the move that can win the game. This will cause the weights of the neural network to converge slowly. By using our method, it may converge faster than the ordinary training methods. The two methods used in this research are a success and a failure result. For experimental data, if the samples are taken well, there is a chance to improve its performance.電腦對局5五將棋蒙地卡羅樹搜尋神經網路深度學習強化學習computer gamesMiniShogiMonte Carlo Tree Searchneural networkdeep learningreinforcement learning利用AlphaZero框架實作與改良MiniShogi程式Implement and Improve a MiniShogi Program Using the AlphaZero Framework