Multiple Policy Value MCTS 結合 Population Based Training 加強連四棋程式

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

電腦對局是人工智慧在計算機科學和工程方面的最古老和最著名的應用之一,而AlphaZero在棋類對局領域是一個非常強大的強化學習算法。AlphaZero是用了MCTS與深度神經網路結合的演算法。較大的神經網路在準確評估方面具有優勢,較小的神經網路在成本和效能方面具有優勢,在有限的預算下必須兩者取得平衡。Multiple Policy Value Monte Carlo Tree Search此方法結合了多個不同大小的神經網路,並保留每個神經網路的優勢。本研究以Surag Nair先生在GitHub上的AlphaZero General程式做修改,加入Multiple Policy Value Monte Carlo Tree Search,並實現在連四棋遊戲上。另外在程式中使用了Multiprocessing來加快訓練速度。最後使用了Population Based Training的方式來尋找較佳的超參數。
Computer games are one of the oldest and most famous applications of artificial intelligence in computer science and engineering. AlphaZero is a very powerful reinforcement learning algorithm in the field of computer games.AlphaZero combines the Monte Carlo Tree Search algorithm with deep neural networks. Larger neural networks have advantages in accurate evaluation, while smaller networks have advantages in cost as well as efficiency. Finding a balance between the two is crucial when working with limited budgets. The Multiple Policy Value Monte Carlo Tree Search combines multiple neural networks of different sizes, leveraging the advantages of each network.In this study, we modified the AlphaZero General program written by Surag Nair on GitHub. We implemented the Multiple Policy Value Monte Carlo Tree Search and applied it to the Connect Four game. To accelerate the training process, we employed multiprocessing techniques in the program. Lastly, we used Population Based Training to search for better hyperparameters.

Description

Keywords

電腦對局, 連四棋, 深度學習, AlphaZero, Multiple Policy Value Monte Carlo Tree Search, Computer Games, Connect Four, Deep Learning, AlphaZero, Multiple Policy Value Monte Carlo Tree Search

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By