以深度學習技術為基礎之線上人體動作辨識應用於室內移動型智慧機器人

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究提出一種以深度學習技術為基礎應用於室內移動型智慧機器人之線上人體動作辨識系統。此系統利用輸入的視覺資訊且在攝影機朝向目標人物移動的狀況下進行線上人體動作辨識,主要目的在提供智慧型人機互動除了聲控與螢幕觸控外更多的介面選擇。 本系統採用三種視覺輸入資訊,分別為彩色影像資訊、短期動態資訊以及人體骨架資訊。且在進行人體偵測時涵蓋五個階段,分別為人體偵測階段、人體追蹤階段、特徵擷取階段、動作辨識階段以及結果整合階段。本系統首先使用一種二維姿態估測方法用來偵測影像中的人物位置,之後利用Deep SORT追蹤方式進行人物追蹤。之後,在已追蹤到的人物身上擷取人體動作特徵以便後續的動作辨識。本系統擷取的人體動作特徵有三種,分別為空間特徵、短期動態特徵以及骨架特徵。在動作辨識階段,本系統將三種人體動作特徵分別輸入三種訓練好的神經網路(LSTM networks)進行人體動作分類。最後,將上述三個不同神經網路的輸出結果整合後作為系統的分類結果輸出以期達到最佳成效。 另外,本研究建立一個移動式攝影機下的人體動作資料庫(CVIU Moving Camera Human Action dataset)。此資料庫共計3646個人體動作影片,其中包含三個不同攝影角度的11種單人動作和5種雙人互動動作。單人動作包括站著喝水、坐著喝水、站著吃食物、坐著吃食物、滑手機、坐下、起立、使用筆記型電腦、直走、橫走和閱讀。雙人互動動作包括踢腿、擁抱、搬東西、走向對方和走離對方。此資料庫的影片也使用來訓練與評估本系統。實驗結果顯示,空間特徵之分類器的辨識率達96.64%,短期動態特徵之分類器的辨識率達81.87%,而骨架特徵之分類器的辨識率則為68.10%。最後,三種特徵之整合辨識率可達96.84%。
This research proposes a vision-based online human action recognition system. This system uses deep learning methods to recognise human action under moving camera circumstances. The proposed system consists of five stages: human detection, human tracking, feature extraction, action classification and fusion. The system uses three kinds of input information: colour intensity, short-term dynamic information and skeletal joints. In the human detection stage, a two-dimensional (2D) pose estimator method is used to detect a human. In the human tracking stage, a deep SORT tracking method is used to track the human. In the feature extraction stage, three kinds of features, spatial, temporal and structural, are extracted to analyse human actions. In the action classification stage, three kinds of features of human actions are respectively classified by three kinds of long short-term memory (LSTM) classifiers. In the fusion stage, a fusion method is used to leverage the three output results from the LSTM classifiers. This study constructs a computer vision and image understanding (CVIU) Moving Camera Human Action dataset (CVIU dataset), containing 3,646 human action sequences, including 11 types of single human actions and 5 types of interactive human actions. Single human actions include drink in sit and stand positions, eat in sit and stand positions, play with a phone, sit down, stand up, use a laptop, walk straight, walk horizontal, and read. Interactive human actions include kick, hug, carry object, walk toward each other, and walk away from each other. This dataset was used to train and evaluate the proposed system. Experimental results showed that the recognition rates of spatial features, temporal features and structural features were 96.64%, 81.87% and 68.10%, respectively. Finally, the fusion result of human action recognition for indoor smart mobile robots in this study was 96.84%.

Description

Keywords

線上人體動作辨識, 室內移動行智慧機器人, 移動式攝影機, 深度學習, 長短期記憶, 雙向長短期記憶, 強化時序長短期記憶, 空間特徵, 時序特徵, 結構特徵, Online human action recognition, Indoor smart mobile robot, Deep learning, Long short-term memory, Bi-directional long short-term memory, Temporal enhancement long short-term memory, Spatial feature, Temporal feature, Structural feature

Citation

Collections