Recently, numerous methods have been introduced for 3D action recognition using handcrafted feature descriptors coupled traditional classifiers. However, they cannot learn high-level features of a whole skeleton sequence exhaustively. In this paper, a novel encoding technique, namely Pose Feature to Image (PoF2I), is introduced to transform the pose features of joint-joint distance and orientation to color pixels. By concatenating the features of all skeleton frames in a sequence, a color image is generated to depict spatial joint correlations and temporal pose dynamics of an action appearance. The strategy of end-to-end fine-tuning a pre-trained deep convolutional neural network, which completely capture multiple high-level features at multi-scale action representation, is implemented for learning recognition models. We further propose an efficient data augmentation mechanism for informative enrichment and overfitting prevention. The experimental results on six challenging 3D action recognition datasets demonstrate that the proposed method outperforms state-of-the-art approaches.
Available online: https://ieeexplore.ieee.org/document/8691567