應用於人體骨架動作辨識的結合快慢網路與注意力自適性圖卷積架構
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文探討了圖像動作辨識與骨架動作辨識任務,近年來骨架動作辨識任務被快速的發展,發展出藉由圖卷積神經網路結合鄰接矩陣表達人體結構的方式,尤其注重於在圖卷積神經網路中的跨距離連結能力,並學習不同型態的骨架資訊在大型數據集達到更高的準確率。我們認為比起學習多樣的資料型態,注重動作的解析同樣重要,因此引入圖像動作辨識的雙流方法,使用高頻率與低頻率分別解析單一型態的骨架序列,從而提取不同的靜態與動態動作資訊。同時兩流分別作為兩種對於關節點的連結策略,分別注重間格性時間與相鄰時間的連結,並在不同層中穿插靜態與動態特徵的融合層。我們所提出的架構在大型數據集NTU RGB+D 中的單資料評估為95.9%的準確率,多資料評估為96.8%的準確率。實驗結果證實了,我們所提出的方法達到更好的結果。
This paper discusses RGB-based action recognition and Skeleton-based action recognition tasks. In recent years, skeleton action recognition tasks have been rapidly developed, and a way of expressing human body structure through graph convolutional neural networks combined with adjacency matrices has been developed, with particular emphasis on the cross-distance connection ability in the graph convolutional neural network, and learn different types of skeleton information to achieve higher accuracy in large data sets. We believe that it is equally important to focus on the analysis of actions instead of learning various data types. Therefore, we introduce a two-stream method for RGB action recognition, using high frequency and low frequency to analyze a single type of skeleton sequence, so as to extract different static and dynamic Action information. At the same time, the two streams are used as two connection strategies for joint points, respectively, focusing on the connection between inter-lattice time and adjacent time, and interspersed with fusion layers of static and dynamic features in different layers. The accuracy of our proposed architecture is 95.9% in single-data evaluation and 96.8% in multi-data evaluation in the large dataset NTU RGB+D. The experimental results confirm that our proposed method achieves better results.
This paper discusses RGB-based action recognition and Skeleton-based action recognition tasks. In recent years, skeleton action recognition tasks have been rapidly developed, and a way of expressing human body structure through graph convolutional neural networks combined with adjacency matrices has been developed, with particular emphasis on the cross-distance connection ability in the graph convolutional neural network, and learn different types of skeleton information to achieve higher accuracy in large data sets. We believe that it is equally important to focus on the analysis of actions instead of learning various data types. Therefore, we introduce a two-stream method for RGB action recognition, using high frequency and low frequency to analyze a single type of skeleton sequence, so as to extract different static and dynamic Action information. At the same time, the two streams are used as two connection strategies for joint points, respectively, focusing on the connection between inter-lattice time and adjacent time, and interspersed with fusion layers of static and dynamic features in different layers. The accuracy of our proposed architecture is 95.9% in single-data evaluation and 96.8% in multi-data evaluation in the large dataset NTU RGB+D. The experimental results confirm that our proposed method achieves better results.
Description
Keywords
動作辨識, 圖卷積網路, 特徵融合, Action Recognition, Graph Convolutional Network, Feature Fusion