Video-based human action recognition recently plays a vital role in many industrial applications thanks to the popularity of depth sensors. A large number of conventional approaches, which have combined handcrafted features and traditional classifiers, cannot deal with various challenges in the field such as the complexity of human actions in the realistic environment. In order to improve recognition performance by exploiting more high-level discriminative features, an efficient skeleton-based action recognition method using deep convolutional neural networks (CNNs) is studied with an image encoder to transform skeleton coordinate data to image-formed data. Since deep learning techniques are fundamentally designed for efficiently working with large datasets, the network overfitting usually occurs if training CNNs on small-scale datasets. To address this issue, a novel data augmentation technique is proposed for both the informative enrichment and overfitting prevention, wherein a skeleton sequence is depicted by manifold action images based on randomly adding some skeleton frames during the data transformation and preparation for the training set. Experimental results on several small-scale challenging datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of action recognition accuracy.