開放式學習應用於優化多目標的連子棋類遊戲

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Open-ended learning是Google DeepMind在2021提出的一種AI,與以前常見的AI不同,Open-ended learning的AI並不會將一種任務做到最佳化,但Open-ended的AI可以做到多種不同的任務,是以多目標最佳化為訴求的AI。目前由於Open-ended learning 是一種非常新的概念,其文獻的數量處於一個相對較少的狀況,實作方面也是在一個較為模糊的階段。故本研究希望使用相對熟悉的技術以及遊戲規則,來嘗試實作出與Open-ended learning類似或是相同的AI。連子棋是一種雙人對弈的遊戲,雙方玩家在圍棋棋盤上輪次落子,先將指定顆數的己方的棋子連成任何橫縱斜方向者為勝。而本研究使用的五子棋、四子棋、及三子棋,規則上除了目標棋子數為五顆、四顆和三顆之外,還有縮小了棋盤的大小。由於Open-ended learning的AI的訓練資料是由程式生成的,故本研究打算以能透過自我對弈來產生訓練資料的alpha-zero-general,來做為實現Open-ended learning的AI的核心,本實驗透過修改alpha-zero-general中自我對弈的部分來使訓練出來的AI獲得可以下多種棋規的能力。
Open-ended learning is a type of AI proposed by Google DeepMind in 2021. Unlike traditional AI, which optimizes for a single task, Open-ended learning AI is designed to perform multiple different tasks, aiming for multi-objective optimization. Currently, due to the novelty of the Open-ended learning concept, the number of related literatures is relatively small, and its practical implementation is still in a somewhat ambiguous stage. Therefore, this study aims to use relatively familiar techniques and game rules to attempt to implement an AI similar to or identical to Open-ended learning.A connection game is a two-player board game where players take turns placing stones on a board, and the first player to align a specified number of their stones in any horizontal, vertical, or diagonal direction wins. In this study, we use connection game variations with targets of five, four, and three stones to win, and we also reduce the board size accordingly.Since the training data for Open-ended learning AI is generated by programs, this study intends to use alpha-zero-general, which can generate training data through self-play, as the core to achieve Open-ended learning AI. This experiment modifies the self-play aspect of alpha-zero-general to enable the trained AI to handle multiple game rules.

Description

Keywords

連子棋, AlphaZero, 開放式學習, Connection Games, AlphaZero, Open-ended Learning

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By