比較Gumbel和KataGo方法提升AlphaZero在外圍開局五子棋的訓練效能
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本研究的目的是探討透過比較 KataGo 和 Gumbel 這兩種方法來儘量減少資 源的數量並保持或提升訓練的效率。KataGo 是一個改良版的 AlphaZero 演算法, 其作者使用了更有效率的訓練算法和重新設計的神經網路架構,並宣稱其訓練 速度比 AlphaZero 快50倍。而 Gumbel 方法則是 DeepMind 在2022年提出的一種方 法,可以在展開蒙地卡羅樹搜索(Monte Carlo Tree Search)時只需展開極少數節點 即可訓練出遠超在相同條件下其他已知演算法的效果。本研究使用這兩種方法應用在提升 AlphaZero 在外圍開局五子棋的棋力,並 比較這兩種方法的優劣和效果。實驗結果顯示,使用 Gumbel 和 KataGo 都可以 有效提升 AlphaZero 在訓練外圍開局五子棋上的效能。並且通過實驗發現,在相 同的訓練代數情況下,KataGo 所訓練出來的棋力比 Gumbel 好。但在相同短期時 間內的訓練中 Gumbel 所訓練出來的棋力比 KataGo 好。在本研究中,我們除了探討 AlphaZero、KataGo 和 Gumbel 演算法的改進外, 還額外討論了兩種提升自我對弈速度的方法以及兩種改進訓練效能的通用方法。首先,我們實作了兩種方法來提升自我對弈速度,並對三種演算法進行了 測試。通過實驗,我們發現這兩種方法的應用能夠平均提升自我對弈速度13.16 倍。這是一個顯著的改善,有效地節省了訓練時間。此外,我們還提出了兩種通用的方法來改進 AlphaZero、KataGo 和 Gumbel 的訓練效能。透過這兩種方法的應用,我們獲得了不錯的結果。這些方法不僅 提升了演算法的訓練效率,還改善了模型的學習能力和準確性。這些結果顯示出,改良 AlphaZero 的 KataGo 以及 Gumbel 方法可以顯著提升外圍開局五子棋 AI 的訓練效果和速度,並且減少所需的訓練資源。這樣的技術 創新可以讓更多的研究者參與到強化學習的研究中,並推動人工智慧在遊戲和 其他領域的發展。
The purpose of this research is to explore two methods, KataGo and Gumbel, and compare their effectiveness in reducing the amount of resources while maintaining orimproving training efficiency. KataGo is an improved version of the AlphaZero algorithm, where the author used more efficient training algorithms, redesigned the neural network architecture, and claimed that it achieves a 50 times reduction in computation over comparable methods. On the other hand, Gumbel is a method proposed by DeepMind in 2022, which can achieve significantly better results than other algorithms under the same conditions by expanding only a few nodes during Monte Carlo Tree Search.In this research, we applied these two methods to enhance the performance of AlphaZero in Outer-Open Gomoku and compared their advantages, disadvantages, and effects. The experimental results show that both Gumbel and KataGo effectively improve the performance of AlphaZero in training Outer-Open Gomoku program. Additionally, through experiments, we found that KataGo trains a stronger model compared to Gumbel under the same training epochs. However, within the same short- term training duration, Gumbel trains a stronger model than KataGo.Furthermore, this research also investigates two methods for improving self-play speed and two general methods for enhancing the training performance of AlphaZero, KataGo, and Gumbel. Through the implementation of these two methods to enhance self-play, the experimental results show an average speedup of 13.16 over the original three algorithms. The other two general methods for improving training performance also yielded promising results.These results demonstrate that the KataGo and Gumbel methods can significantly enhance the training effectiveness and speed for developing the Outer-Open Gomoku program, while reducing the required computation resources. Such technological innovations enable more researchers to participate in reinforcement learning research and advance the development of artificial intelligence in games and other domains.
The purpose of this research is to explore two methods, KataGo and Gumbel, and compare their effectiveness in reducing the amount of resources while maintaining orimproving training efficiency. KataGo is an improved version of the AlphaZero algorithm, where the author used more efficient training algorithms, redesigned the neural network architecture, and claimed that it achieves a 50 times reduction in computation over comparable methods. On the other hand, Gumbel is a method proposed by DeepMind in 2022, which can achieve significantly better results than other algorithms under the same conditions by expanding only a few nodes during Monte Carlo Tree Search.In this research, we applied these two methods to enhance the performance of AlphaZero in Outer-Open Gomoku and compared their advantages, disadvantages, and effects. The experimental results show that both Gumbel and KataGo effectively improve the performance of AlphaZero in training Outer-Open Gomoku program. Additionally, through experiments, we found that KataGo trains a stronger model compared to Gumbel under the same training epochs. However, within the same short- term training duration, Gumbel trains a stronger model than KataGo.Furthermore, this research also investigates two methods for improving self-play speed and two general methods for enhancing the training performance of AlphaZero, KataGo, and Gumbel. Through the implementation of these two methods to enhance self-play, the experimental results show an average speedup of 13.16 over the original three algorithms. The other two general methods for improving training performance also yielded promising results.These results demonstrate that the KataGo and Gumbel methods can significantly enhance the training effectiveness and speed for developing the Outer-Open Gomoku program, while reducing the required computation resources. Such technological innovations enable more researchers to participate in reinforcement learning research and advance the development of artificial intelligence in games and other domains.
Description
Keywords
神經網路, 外圍開局五子棋, AlphaZero, KataGo, Gumbel, Neural Network, Outer-Open Gomoku, AlphaZero, KataGo, Gumbel