多智能體強化學習中的適當橢圓影響範圍分析
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
近幾年在機器學習領域中,強化學習的應用越來越廣泛,希望智能體在某些規範下與環境互動後,學習到好的策略與行為模式,更能夠有效地達成目標,或是讓用戶有更好的體驗。但畢竟單獨的智能體與整個環境互動是有限的,所以往往需要與其他智能體合作,得知彼此的資訊,才能更了解所處環境,以便選擇更有效的策略達成目標。不過若要讓單一智能體了解整個環境,或是與整個環境下的智能體交換資訊,不僅耗費許多時間及資源,且不切實際。因此是否可以建立通訊功能,使智能體之間可以互相通訊,找到一個合適拓展的範圍,且智能體在通訊範圍內可以彼此交換訊息,也可以有效地達成目標?在本篇論文中,利用指數衰減的特性,表現出兩智能體在相同策略下,對於彼此的影響會隨著距離的增加呈現指數衰減;並使用置信橢圓代表每一時間的隊伍,藉由置信橢圓的長短軸與隊友間距離得出合適的通訊距離模型。環境架構在星際爭霸多智能體挑戰賽(SMAC)下,訓練的好壞由勝率判斷。因此加入合適的通訊範圍訓練模型,相對於不能通訊或沒有合適通訊距離的模型,表現出勝率有更快的提升。
In recent years, reinforcement learning has become more and more widely used in the field of machine learning. It is hoped that after the agent interacts with the environment under certain norms, it can learn good policy and behavior patterns, to achieve goals more effectively, or let users have a better experience. But after all, the interaction between a single agent and the entire environment is limited, so it is often necessary to cooperate with other agents to obtain each other's information to better understand the environment, and choose a more effective policy to achieve the goal. However, it is time and resource-consuming for a single agent to understand the entire environment or to exchange information with agents in the entire environment. Therefore, is it possible to establish a communication function, so that the agents can communicate with each other, find a suitable expansion range, and the agents can exchange messages with each other within the communication range, and can also effectively achieve the goal?In this paper, we use the property of exponential decay to show that under the same policy, the influence of two agents on each other will decay exponentially with the increase of distance. And use the confidence ellipse to represent the formation at each time, and obtain the appropriate communication distance model by the distance between the long and short axes of the confidence ellipse and the teammates. Environment architecture under the StarCraft Multi-Agent Challenge (SMAC), training quality is judged by winning percentage. Therefore, adding a suitable communication range to train a model shows a faster increase in win rate compared to a model that cannot communicate or does not have the appropriate communication distance.
In recent years, reinforcement learning has become more and more widely used in the field of machine learning. It is hoped that after the agent interacts with the environment under certain norms, it can learn good policy and behavior patterns, to achieve goals more effectively, or let users have a better experience. But after all, the interaction between a single agent and the entire environment is limited, so it is often necessary to cooperate with other agents to obtain each other's information to better understand the environment, and choose a more effective policy to achieve the goal. However, it is time and resource-consuming for a single agent to understand the entire environment or to exchange information with agents in the entire environment. Therefore, is it possible to establish a communication function, so that the agents can communicate with each other, find a suitable expansion range, and the agents can exchange messages with each other within the communication range, and can also effectively achieve the goal?In this paper, we use the property of exponential decay to show that under the same policy, the influence of two agents on each other will decay exponentially with the increase of distance. And use the confidence ellipse to represent the formation at each time, and obtain the appropriate communication distance model by the distance between the long and short axes of the confidence ellipse and the teammates. Environment architecture under the StarCraft Multi-Agent Challenge (SMAC), training quality is judged by winning percentage. Therefore, adding a suitable communication range to train a model shows a faster increase in win rate compared to a model that cannot communicate or does not have the appropriate communication distance.
Description
Keywords
多智能體強化學習, 置信橢圓, 星際爭霸多智能體挑戰賽, Multi-agent reinforcement learning, Confidence ellipse, StarCraft MultiAgent Challenge