強化學習與遷移學習應用於六貫棋遊戲
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
六貫棋是一款雙人對局遊戲,起初在1942年於丹麥的報紙中出現,被稱為Polygon。1948年時,被美國數學家John Forbes Nash Jr.重新獨立發明,並稱為Nash。最後在1952年由製造商Parker Brothers發行,且將其命名為Hex。在此遊戲中,上下及左右的對邊各以一個顏色表示,雙方玩家需要在棋盤上落子並將自己顏色的對邊連接以取得勝利。此遊戲為零和遊戲,且不會有平手的情況發生。在以前的研究中,六貫棋在9路以下的盤面已經被破解。由於AlphaZero的問世,現今電腦對局遊戲的程式有更進一步的發展,以該方法研發的對局程式都有不錯的棋力。而在六貫棋遊戲中,不得不提由加拿大Alberta大學研發的Mohex程式,該程式一直都在競賽中得到優異的成績,至今也持續進行改良。本研究試圖以AlphaZero的訓練框架進行強化學習,並以Mohex破解的盤面資料為輔助。在訓練大盤面的模型時需要較多的成本,因此嘗試結合遷移學習的方式,運用已經破解的小盤面資料,使初期的自我對下階段就能產生較好的棋譜,而不是從完全的零知識開始訓練,藉此提升大盤面模型的訓練成果。並且比較在進行遷移學習時,使用不同參數轉移方法的影響。
Hex is a two-player board game that first appeared in a Denmark newspaper in 1942 and was called Polygon. In 1948, American mathematician John Forbes Nash Jr. reinvented the game independently and called it Nash. Finally, in 1952, it was published by the manufacturer Parker Brothers and renamed Hex. In the game board, each of the opposite sides (vertically and horizontally) is represented by a different color. Players take turns placing their pieces on the board to connect opposite sides that marked by their colors to win. This game is a zero-sum game, and a tie is impossible. In previous research, the game has been solved for board sizes smaller than 9×9.With the advent of AlphaZero, programs for board games have been further investigation, and programs developed using this method have also shown good performance. In the game of Hex, the program “Mohex” developed by the University of Alberta is noteworthy. It already had excellent results in competitions and is continuously improving its strength.This thesis attempts to use the framework of AlphaZero for reinforcement learning and uses the solved board data from Mohex for assistance. Since training a model for larger board sizes require more resources, so we aim to combine transfer learning with solved games for smaller board sizes to get better gameplay in the early stages of self-play, rather than starting from zero knowledge. By the above approach, we try to improve the training results of the model for larger board sizes. Additionally, we compare the effects of using different ways to transfer parameters during transfer learning.
Hex is a two-player board game that first appeared in a Denmark newspaper in 1942 and was called Polygon. In 1948, American mathematician John Forbes Nash Jr. reinvented the game independently and called it Nash. Finally, in 1952, it was published by the manufacturer Parker Brothers and renamed Hex. In the game board, each of the opposite sides (vertically and horizontally) is represented by a different color. Players take turns placing their pieces on the board to connect opposite sides that marked by their colors to win. This game is a zero-sum game, and a tie is impossible. In previous research, the game has been solved for board sizes smaller than 9×9.With the advent of AlphaZero, programs for board games have been further investigation, and programs developed using this method have also shown good performance. In the game of Hex, the program “Mohex” developed by the University of Alberta is noteworthy. It already had excellent results in competitions and is continuously improving its strength.This thesis attempts to use the framework of AlphaZero for reinforcement learning and uses the solved board data from Mohex for assistance. Since training a model for larger board sizes require more resources, so we aim to combine transfer learning with solved games for smaller board sizes to get better gameplay in the early stages of self-play, rather than starting from zero knowledge. By the above approach, we try to improve the training results of the model for larger board sizes. Additionally, we compare the effects of using different ways to transfer parameters during transfer learning.
Description
Keywords
六貫棋, 強化學習, 遷移學習, Hex, Reinforcement Learning, Transfer Learning