基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境

陳宥儒; Chen, Yu-Ju

基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境

dc.contributor	陳美勇	zh_TW
dc.contributor	Chen, Mei-Yung	en_US
dc.contributor.author	陳宥儒	zh_TW
dc.contributor.author	Chen, Yu-Ju	en_US
dc.date.accessioned	2023-12-08T07:50:10Z
dc.date.available	2023-02-13
dc.date.available	2023-12-08T07:50:10Z
dc.date.issued	2023
dc.description.abstract	本論文提出了一種基於馬可夫決策過程的機器人路徑規劃演算法。首先，需要將目標點設為一個正的獎勵訊號。其次，代理人每走一格就會有一個負的獎勵訊號。代理人必須最大化其長期累積的總獎勵，這也是代理人的唯一目標。利用根據能夠將長期獎勵最大化所得到的策略來決定機器人行走路徑。最後，將每個位置所得到的策略串聯來，就得到一條最佳路徑。此外，本篇論文透過設計馬可夫決策過程中的價值函數，使得規劃出來的路徑能與牆壁與移動障礙物保持一定的安全距離。最後，在本論文模擬中，代理人在產生第一條路徑之後，因應環境變化產生其他路徑相當迅速，且會主動閃避移動障礙物 ;而在實驗的部分，使用了搭載機器人作業系統 (Robot Operating System,ROS)的雙輪差動機器人在有移動的障礙物和移動的人時，皆能有效的產生閃避障礙物之路徑。此路徑規劃演算法是由馬可夫決策過程發展而成，也是現代機器學習的基石。有別於傳統的路徑規劃演算法，像是 Dijkstra、 A、 D之類的演算這些演算法無法在複雜動態環境有良好表現甚至無法適用於動態環境，本篇論文所提出的基於馬可夫決策過程路徑規劃演算法是以計算全域地圖上各點的獎勵訊號來決定路徑，在每個時刻、每一個點都會有一個預期回報的期望值，所以在動態變化較大的環境中可以比較即時的更改路徑因此其在動態環境的效率較佳。	zh_TW
dc.description.abstract	This paper presents a robot path planning algorithm based on Markov decision process. First, we need to set the target point as a positive reward signal. Second, there is a negative reward signal for each square the agent moves. The agent must maximize its long-term accumulated total reward, which is the only goal of the agent. Use a policy that maximizes the long-term reward to determine the path. Otherwise, concatenate the policies obtained at each location to obtain an optimal path. In addition, this paper designs the value function in the Markov decision process so that the path can maintain a certain safe distance from walls and moving obstacles. Finally, the agent generates other paths in response to environmental changes quite quickly after generating the first path and will actively dodge moving obstacles in the simulation of this paper; in the experimental part, a two-wheeled robot equipped with a robot operating system is used. When there are moving obstacles and moving people, the robot can effectively generate paths to avoid obstacles.This path planning algorithm was developed from the Markov decision process and is the cornerstone of modern machine learning. Different from traditional path planning algorithms such as Dijkstra, A, and D, these algorithms cannot perform well in complex dynamic environments or even to dynamic environments. The Markov Decision process based path planning algorithm determines the path by calculating the reward signal of each point on the map. At each moment and each point, there will be an expected value of expected return, so it can be compared in real-time in an environment with large dynamic changes. So it is more efficient in a dynamic environment.	en_US
dc.description.sponsorship	機電工程學系	zh_TW
dc.identifier	60973018H-42909
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/c90207ff9d7e4756c460db6bffeaefba/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/120541
dc.language	中文
dc.subject	馬可夫決策過程	zh_TW
dc.subject	複雜動態環境	zh_TW
dc.subject	路徑規劃	zh_TW
dc.subject	雙輪差動機器人	zh_TW
dc.subject	Markov decision process	en_US
dc.subject	complex dynamic environment	en_US
dc.subject	path planning	en_US
dc.subject	two-wheeled mobile robot	en_US
dc.title	基於馬可夫決策過程之路徑規劃演算法用於複雜動態環境	zh_TW
dc.title	Path Planning Algorithm Based on Markov Decision Process for Complex Dynamic Environment	en_US
dc.type	etd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202300042909-105227.pdf
Size:: 4.54 MB
Format:: Adobe Portable Document Format
Description:: etd

Download

Collections

學位論文