適用於陪伴型機器人之視覺式人體動作辨識系統
Abstract
由於醫療設備的進步及雙薪家庭的普遍,導致負擔家計的青壯年在工作繁忙的狀況下,無法照顧與陪伴家中的年長者與兒童。因此,若陪伴型機器人能協助青壯年照顧與陪伴年長者與兒童,既可減輕青壯年之人力負擔也可以增加兒童及年長者的安全感與生活品質。陪伴型機器人主要用於協助兒童及年長者的生活,照顧與陪伴了解被陪伴者的行為及狀態,並做出適當的相對應之回應,以達到互動、陪伴及照顧的功能。所以本研究開發一套適用於陪伴型機器人的視覺式人體動作辨識系統,自動辨識被陪伴者之動作,以達到陪伴、照顧與觀察等功能。
本系統的人體動作特徵擷取分為兩個部份,其中一部份為深度資訊特徵擷取,另一部份為人物輪廓特徵擷取。當系統讀入連續的人體動作深度影像後,會先進行人物位置之驗證,接著對該人物之深度影像建構深度直方圖(range histogram),並累積多個深度直方圖以做為深度資訊特徵;另一方面對該人物進行輪廓偵測,計算構成輪廓的各點位置與該輪廓的頂點位置的距離,得其輪廓點位置與該輪廓頂點位置的相對距離特徵,並進行差值累積,以此作為人物輪廓動作特徵描述。最後,利用兩個Extreme Learning Machines進行階層式人體動作分類,第一階段先進行深度資訊特徵分類,若第一階段未得出分類結果,則再輔以第二階段人物輪廓特徵分別進行分類。
本研究的動作共有八種,分別為走路(由遠而近走路)、正面敬禮、握手(包含握左手以及握右手)、彎腰、伸手取物、揮左手、揮右手以及蹲下。實驗影片共有760段且每段影片均為同個動作類別,合計影片frames數約為15156張,拍攝23至28歲之成人。其中以560段影片做為訓練集,由七個人各執行八種動作十次。而其餘200段影片則為測試集,由五個人各執行八種動作五次。透過實驗結果可得知,本系統之人體動作辨識率約為85.0%,由此可知,本系統的辨識結果具有一定的可信度。
Because of the advancement of medical technology and the generality of double-income families, young adults are busy for the work, so that don’t have much time to take care of the elderly and children. Therefore, the companion robot can help young adults take care of the elderly and children, also can reduce the pressure of young adults. It can also increase the family's sense of security and quality in the life. The main capabilities of the companion robot is to assisting the elderly and children’s life. It can take care and accompany the elderly and children, understand their behavior and make the corresponding response. In order to achieve interaction, companionship, care and observation effect. Therefore, this study proposes a vision-based human action recognition system for companion robots due to above’s advantages. The input videos of the proposed system is obtained from one Kinect 2.0 for Xbox One. In this study, Human feature extraction of the system can be divided into two parts, one is depth image information feature extraction, and the other is human contour feature extraction. When the system starts, the system verification of the human’s position from the input images, then do human feature extraction. In depth image information feature extraction, construction range histogram and cumulative the range histogram. Use the accumulated range histogram as depth image information feature. In human contour feature extraction, do the human contour detection, and compute the distance between each human contour’s point with top point. Calculate the cumulative difference contour’s distance to be human contour feature. In classification part, two stage hierarchical Extreme Learning Machines is used. The first stage is depth image information feature classification, and second stage is human contour feature classification. When the system did not gets the action classification from first stage, then do the second stage classification. There are eight action of this study, include walk, bow, shake hands, bend, take, wave right hand, wave left hand and squat. The number of experimental sequence is 760 with total 15156 frames. Each sequence only contains one action, while the average rate of human action recognition is 85.0%. As a result, the proposed system is robust and efficient.
Because of the advancement of medical technology and the generality of double-income families, young adults are busy for the work, so that don’t have much time to take care of the elderly and children. Therefore, the companion robot can help young adults take care of the elderly and children, also can reduce the pressure of young adults. It can also increase the family's sense of security and quality in the life. The main capabilities of the companion robot is to assisting the elderly and children’s life. It can take care and accompany the elderly and children, understand their behavior and make the corresponding response. In order to achieve interaction, companionship, care and observation effect. Therefore, this study proposes a vision-based human action recognition system for companion robots due to above’s advantages. The input videos of the proposed system is obtained from one Kinect 2.0 for Xbox One. In this study, Human feature extraction of the system can be divided into two parts, one is depth image information feature extraction, and the other is human contour feature extraction. When the system starts, the system verification of the human’s position from the input images, then do human feature extraction. In depth image information feature extraction, construction range histogram and cumulative the range histogram. Use the accumulated range histogram as depth image information feature. In human contour feature extraction, do the human contour detection, and compute the distance between each human contour’s point with top point. Calculate the cumulative difference contour’s distance to be human contour feature. In classification part, two stage hierarchical Extreme Learning Machines is used. The first stage is depth image information feature classification, and second stage is human contour feature classification. When the system did not gets the action classification from first stage, then do the second stage classification. There are eight action of this study, include walk, bow, shake hands, bend, take, wave right hand, wave left hand and squat. The number of experimental sequence is 760 with total 15156 frames. Each sequence only contains one action, while the average rate of human action recognition is 85.0%. As a result, the proposed system is robust and efficient.
Description
Keywords
人體動作辨識, Kinect 2.0 for Xbox One, 深度影像, 人物輪廓, Extreme Learning Machines, human action recognition, Kinect 2.0 for Xbox One, depth image, human contour, Extreme Learning Machines