Numerous existing handcrafted feature-based and conventional machine learning-based approaches cannot seize the intensive correlations of skeleton structure in the spatiotemporal dimension. On another hand, some modern methods exploiting Long Short Term Memory (LSTM) networks for learning temporal action attribute lack an efficient scheme of revealing high-level informative features. To handle the aforementioned issues, this paper presents a novel hierarchical deep feature fusion model for 3D skeleton-based action recognition, in which the deep information for modeling human appearance and action dynamic is gained by Convolutional Neural Networks (CNNs). The deep features of geometrical joint distance and orientation are extracted via a multi-stream CNN architecture to uncovering the hidden correlations in the spatiotemporal dimension. The experimental results on the NTU RGB+D dataset demonstrates the superiority of the proposed fusion model against several
recently deep learning (DL)-based action recognition approaches.