蒙地卡羅樹搜索法的必贏策略以及快速Nonogram解題程式的實作

陳彥吉; Chen, Yen-Chi

蒙地卡羅樹搜索法的必贏策略以及快速Nonogram解題程式的實作

dc.contributor	林順喜	zh_TW
dc.contributor	Lin, Shun-Shii	en_US
dc.contributor.author	陳彥吉	zh_TW
dc.contributor.author	Chen, Yen-Chi	en_US
dc.date.accessioned	2019-09-05T11:15:42Z
dc.date.available	2024-07-25
dc.date.available	2019-09-05T11:15:42Z
dc.date.issued	2019
dc.description.abstract	DeepMind的AlphaZero展現了增強式學習即使在沒有人類知識的情況下也能表現出超越人類世界冠軍的棋力。然而AlphaZero所使用的蒙地卡羅樹搜索法無法根據遊戲理論值來評估盤面好壞。即使遊戲的結果已經被得知，蒙地卡羅樹仍會拜訪這個節點。在這篇論文中我們提出了Exact-win策略來對蒙地卡羅樹進行剪枝。Exact-win讓MCTS不再去處理已知遊戲理論值的節點，增加發現其他關鍵走步的機會。實驗結果顯示了我們的Exact-win方法在一些即死遊戲上顯著提升了原始MCTS的棋力，像是在井字遊戲和連四棋。在使用了Exact-win策略之後，Exact-win與原始版本的Leela Zero、ELF OpenGo和PhoenixGo對下了100盤後分別取得61、58和51場勝場。雖然DeepMind的AlphaZero仍未開源，但我們期待未來我們的方法也能用來加強AlphaZero。就我們所知，這是第一個可以直接加強AlphaZero的方法。在本篇論文中我們也將揭露我們的Nonogram程式Requiem的實作方式，該程式在近幾次的比賽中都以十分顯著的時間差距贏得冠軍。Nonogram是一個單人的紙筆邏輯遊戲，玩家須根據每一行每一列的提示來對二維的方格填入顏色。我們改進了吳老師等人的方法，藉由自由度參數來減少maximal painting的計算開銷。並結合一個設計好的位元盤面表示法來配合BMI指令架構，在加速運算的同時減少記憶體的負載。我們的Nonogram程式正確地解開了2011年到2018年間的所有錦標賽的題目，並且比歷年的程式都來得快。	zh_TW
dc.description.abstract	DeepMind’s AlphaZero demonstrated that reinforcement learning could reach the strength of over the human champions without human knowledge. However, the Monte-Carlo Tree Search (MCTS) used by AlphaZero cannot know the theoretical value of a board state. Even if the result of a node has been known, MCTS will still revisit this node. In this thesis, we propose the Exact-win strategy to prune the Monte-Carlo tree. Exact-win allows MCTS to no longer deal with the nodes that have the determined theoretical values and increases the chances of discovering other key moves. The experiments show that our Exact-win method substantially enhances the strength of the original MCTS in some sudden death games, such as the Tic-Tac-Toe and the Connect4. After employing the Exact-win strategy, Exact-win won 61, 58, and 51 of per 100 games against the original version of Leela Zero, ELF OpenGo, and PhoenixGo, respectively. DeepMind’s AlphaZero is currently not open source. If they make it open source, we expect that our approach can also promote AlphaZero. As far as we know, this is the first concept to enhance AlphaZero's approach with the concrete experiments performed on Go. Also, in this thesis, we reveal the implementation of our Nonogram program, named Requiem, which won all the games by a significant time gap in recent competitions. Nonogram is a pen and paper single-player logic game in which players paint each cell of a two-dimensional grid according to clues for specific rows and columns. We improve the method proposed by Wu et al. by adding a freedom parameter which can significantly reduce the total computation cost of maximal painting. Combining a well-designed bitboard with BMI instruction, we reduce memory loading while accelerating operations. Our Nonogram program solved all 1000 puzzles in every Nonogram Tournament from 2011 to 2018 faster than all nonogram programs.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	G060647038S
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G060647038S%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106522
dc.language	英文
dc.subject	增強式學習	zh_TW
dc.subject	蒙地卡羅樹搜索法	zh_TW
dc.subject	AlphaZero	zh_TW
dc.subject	必贏策略	zh_TW
dc.subject	Nonogram	zh_TW
dc.subject	謎題	zh_TW
dc.subject	動態規劃	zh_TW
dc.subject	位元操作指令	zh_TW
dc.subject	Reinforcement learning	en_US
dc.subject	Monte-Carlo Tree Search	en_US
dc.subject	AlphaZero	en_US
dc.subject	Exact-win	en_US
dc.subject	Nonogram	en_US
dc.subject	Puzzle	en_US
dc.subject	Dynamic Programming	en_US
dc.subject	BMI	en_US
dc.title	蒙地卡羅樹搜索法的必贏策略以及快速Nonogram解題程式的實作	zh_TW
dc.title	Exact-win Strategy for Monte Carlo Tree Search and the Implementation of a Fast Nonogram Solver	en_US

Collections

學位論文

蒙地卡羅樹搜索法的必贏策略以及快速Nonogram解題程式的實作

Files

Collections