基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境

dc.contributor陳美勇zh_TW
dc.contributorChen, Mei-Yungen_US
dc.contributor.author陳宥儒zh_TW
dc.contributor.authorChen, Yu-Juen_US
dc.date.accessioned2023-12-08T07:50:10Z
dc.date.available2023-02-13
dc.date.available2023-12-08T07:50:10Z
dc.date.issued2023
dc.description.abstract本論文提出了一種基於 馬可夫決策過程 的 機器人 路徑規劃演算法 。首先, 需要將目標點設為一個正的獎勵訊號 。其次, 代理人每走一格就會有一個負的獎勵訊號 。 代理人必須最大化其長期累積的總獎勵,這也是代理人的唯一目標 。利用 根據能夠將長期獎勵最大化所得到的策略來決定機器人行走路徑 。最後,將每個位置所得到的策略串聯來,就得到一條最佳路徑 。此外,本篇論文透過設計 馬可夫決策過程中的價值函數,使得規劃出來的路徑能與牆壁與移動障礙物保持一定的安全距離。 最後,在本論文模擬中,代理人在 產生第一條路徑之後,因應環境變化產生其他路徑相當迅速,且會主動閃避移動障礙物 ;而在實驗的部分,使用了搭載機器人作業系統 (Robot Operating System,ROS)的雙輪差動機器人在 有移動的障礙物和移動的人時,皆能有效的產生閃避障礙物之路徑。此路徑規劃演算法是由馬可夫決策過程發展而成,也是現代機器學習的基石。有別於傳統的路徑規劃演算法,像是 Dijkstra、 A*、 D*之類的演算 這些演算法無法在複雜動態環境有良好表現甚至無法適用於動態環境,本篇論文所提出的基於馬可夫決策過程路徑規劃演算法 是以計算全域地圖上 各點的獎勵訊 號來決定路徑,在 每個時刻、每一個點都會有一個 預期回報的期望值 ,所以在動態變化較大的環境中可以比較即時的更改路徑因此其在動態環境的效率較佳。zh_TW
dc.description.abstractThis paper presents a robot path planning algorithm based on Markov decision process. First, we need to set the target point as a positive reward signal. Second, there is a negative reward signal for each square the agent moves. The agent must maximize its long-term accumulated total reward, which is the only goal of the agent. Use a policy that maximizes the long-term reward to determine the path. Otherwise, concatenate the policies obtained at each location to obtain an optimal path. In addition, this paper designs the value function in the Markov decision process so that the path can maintain a certain safe distance from walls and moving obstacles. Finally, the agent generates other paths in response to environmental changes quite quickly after generating the first path and will actively dodge moving obstacles in the simulation of this paper; in the experimental part, a two-wheeled robot equipped with a robot operating system is used. When there are moving obstacles and moving people, the robot can effectively generate paths to avoid obstacles.This path planning algorithm was developed from the Markov decision process and is the cornerstone of modern machine learning. Different from traditional path planning algorithms such as Dijkstra, A*, and D*, these algorithms cannot perform well in complex dynamic environments or even to dynamic environments. The Markov Decision process based path planning algorithm determines the path by calculating the reward signal of each point on the map. At each moment and each point, there will be an expected value of expected return, so it can be compared in real-time in an environment with large dynamic changes. So it is more efficient in a dynamic environment.en_US
dc.description.sponsorship機電工程學系zh_TW
dc.identifier60973018H-42909
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/c90207ff9d7e4756c460db6bffeaefba/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/120541
dc.language中文
dc.subject馬可夫決策過程zh_TW
dc.subject複雜動態環境zh_TW
dc.subject路徑規劃zh_TW
dc.subject雙輪差動機器人zh_TW
dc.subjectMarkov decision processen_US
dc.subjectcomplex dynamic environmenten_US
dc.subjectpath planningen_US
dc.subjecttwo-wheeled mobile roboten_US
dc.title基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境zh_TW
dc.titlePath Planning Algorithm Based on Markov Decision Process for Complex Dynamic Environmenten_US
dc.typeetd

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202300042909-105227.pdf
Size:
4.54 MB
Format:
Adobe Portable Document Format
Description:
etd

Collections