Time filter

Source Type

Guo J.,CAS Shenzhen Institutes of Advanced Technology | Guo J.,Chinese University of Hong Kong | Cheng J.,CAS Shenzhen Institutes of Advanced Technology | Cheng J.,Chinese University of Hong Kong | And 5 more authors.
Applied Mechanics and Materials

In this paper, we present a dynamic gesture recognition system. We focus on the visual sensory information to recognize human activity in form of hand movements from a small, predefined vocabulary. A fast and effective method is presented for hand detection and tracking at first for the trajectory extraction. A novel trajectory correction method is applied for simply but effectively trajectory correction. Gesture recognition is achieved by means of a matching technique by determining the distance between the unknown input direction code sequence and a set of previously defined templates. A dynamic time warping (DTW) algorithm is used to perform the time alignment and normalization by computing a temporal transformation allowing the two signals to be matched. Experiment results show our proposed gesture recognition system achieve well result in real time. © (2013) Trans Tech Publications, Switzerland. Source

Xiao Q.,Chinese University of Hong Kong | Cheng J.,Chinese University of Hong Kong | Cheng J.,Guangdong Provincial Key Laboratory of Robotics and Intelligent System
2013 IEEE International Conference on Information and Automation, ICIA 2013

In this paper, we propose a framework which fuses multiple features for action recognition in depth sequence. The fusion of multiple features is important for recognizing action since a single feature-based representation is inadequate to capture the variants. Hence, we use two types of features: i) a quantized vocabulary of local spatio-temporal descriptor HOG3D, and ii) a global projection based descriptor that computes the HOG from the Depth Motion Maps. To optimally combine these features, we input those features to different classifiers, where SVM is applied to estimate the probabilities of action labels. Then, we weight those probabilities respectively and sum it to find the maximum score of action labels. The proposed approach is tested on publicly available MSR Action3D dataset which demonstrates that fusion of multiple features help to achieve improved performance significantly, outperforming Li et al.[1] in most of the cases. © 2013 IEEE. Source

Jiang J.,CAS Shenzhen Institutes of Advanced Technology | Jiang J.,Chinese University of Hong Kong | Jiang J.,Shenzhen Key Laboratory of Computer Vision and Pattern Recognition | Cheng J.,CAS Shenzhen Institutes of Advanced Technology | And 3 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Real-time 3D sensing plays a critical role in robotic navigation, video surveillance and human-computer interaction, etc. When computing 3D structures of dynamic scenes from stereo sequences, spatiotemporal stereo and scene flow methods can produce temporally coherent disparity. However, most existing methods do not utilize the previous disparity map sufficiently to compute the next disparity map, and the searching space of correspondences limits the speed of disparity computation for each image pair. This paper proposes an effective scheme to predict disparity maps from stereo sequences. In particular, we apply a robust 3D registration algorithm based on the angular-invariant feature to estimate the ego-motion of the stereo rig between consecutive frames, and present the transformation between consecutive disparity maps. The scheme can produce a sequence of temporally coherent disparity maps rapidly. We apply the new scheme to real outdoor scenes, and thorough empirical studies indicate the effectiveness of the new scheme for practical applications. © 2013 Springer-Verlag. Source

Cheng J.,CAS Shenzhen Institutes of Advanced Technology | Cheng J.,Chinese University of Hong Kong | Cheng J.,Guangdong Provincial Key Laboratory of Robotics and Intelligent System | Bian W.,Intelligent Systems Technology, Inc. | Tao D.,Intelligent Systems Technology, Inc.
Information Sciences

Gesture recognition plays an important role in human machine interactions (HMIs) for multimedia entertainment. In this paper, we present a dimension reduction based approach for dynamic real-time hand gesture recognition. The hand gestures are recorded as acceleration signals by using a handheld with a 3-axis accelerometer sensor installed, and represented by discrete cosine transform (DCT) coefficients. To recognize different hand gestures, we develop a new dimension reduction method, locally regularized sliced inverse regression (LR-SIR), to find an effective low dimensional subspace, in which different hand gestures are well separable, following which recognition can be performed by using simple and efficient classifiers, e.g., nearest mean, k-nearest-neighbor rule and support vector machine. LR-SIR is built upon the well-known sliced inverse regression (SIR), but overcomes its limitation that it ignores the local geometry of the data distribution. Besides, LR-SIR can be effectively and efficiently solved by eigen-decomposition. Finally, we apply the LR-SIR based gesture recognition to control our recently developed dance robot for multimedia entertainment. Thorough empirical studies on 'digits'-gesture recognition suggest the effectiveness of the new gesture recognition scheme for HMI. © 2012 Elsevier Inc. All rights reserved. Source

Li X.,Southwest Jiaotong University | He H.,Southwest Jiaotong University | Yin Z.,Southwest Jiaotong University | Yin Z.,CAS Institute of Remote Sensing | And 3 more authors.

In this paper, we present a novel learning-based single image super-resolution algorithm to address the problems of inefficient learning and improper estimation in coping with nonlinear high-dimensional feature data. Our method named as subspace projection and neighbor embedding (SPNE) first projects the high-dimensional data into two different subspaces respectively, i.e., kernel principal component analysis (KPCA) subspace and modified locality preserving projection (MLPP) subspace to obtain the global and local structures of data. In an optimal low-dimensional feature space, the k-nearest neighbors of each input low-resolution (LR) image patch can be found for efficient learning. Then within similarity measures and proportional factors, the k embedding weights are used to estimate high-frequency information from a training dataset. Finally, we apply iterative back projection (IBP) to further enhance the super-resolution results. Experiments on simulative and actual LR images demonstrate that the proposed approach outperforms the existing NE-based super-resolution methods in terms of visual quality and some selected objective metrics. © 2014 Elsevier B.V. Source

Discover hidden collaborations