Advanced Digital science Center

Singapore, Singapore

Advanced Digital science Center

Singapore, Singapore
Time filter
Source Type

Deng W.,Beijing University of Posts and Telecommunications | Hu J.,Beijing University of Posts and Telecommunications | Lu J.,Advanced Digital science Center | Guo J.,Beijing University of Posts and Telecommunications
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2014

We develop a transform-invariant PCA (TIPCA) technique which aims to accurately characterize the intrinsic structures of the human face that are invariant to the in-plane transformations of the training images. Specially, TIPCA alternately aligns the image ensemble and creates the optimal eigenspace, with the objective to minimize the mean square error between the aligned images and their reconstructions. The learning from the FERET facial image ensemble of 1,196 subjects validates the mutual promotion between image alignment and eigenspace representation, which eventually leads to the optimized coding and recognition performance that surpasses the handcrafted alignment based on facial landmarks. Experimental results also suggest that state-of-the-art invariant descriptors, such as local binary pattern (LBP), histogram of oriented gradient (HOG), and Gabor energy filter (GEF), and classification methods, such as sparse representation based classification (SRC) and support vector machine (SVM), can benefit from using the TIPCA-aligned faces, instead of the manually eye-aligned faces that are widely regarded as the ground-truth alignment. Favorable accuracies against the state-of-the-art results on face coding and face recognition are reported. © 2013 IEEE.

Ni B.,Advanced Digital science Center | Wang G.,Advanced Digital science Center | Moulin P.,University of Illinois at Urbana - Champaign
Proceedings of the IEEE International Conference on Computer Vision | Year: 2011

In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams, for the task of human daily activity recognition. This database aims at encouraging more research efforts on human activity recognition based on multi-modality sensor combination (e.g., color plus depth). 2) Two multi-modality fusion schemes, which naturally combine color and depth information, have been developed from two state-of-the-art feature representation methods for action recognition, i.e., spatio-temporal interest points (STIPs) and motion history images (MHIs). These depth-extended feature representation methods are evaluated comprehensively and superior recognition performances over their uni-modality (e.g., color only) counterparts are demonstrated. © 2011 IEEE.

Hu J.,Nanyang Technological University | Lu J.,Advanced Digital science Center | Tan Y.-P.,Nanyang Technological University
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2014

This paper presents a new discriminative deep metric learning (DDML) method for face verification in the wild. Different from existing metric learning-based face verification methods which aim to learn a Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, simultaneously, the proposed DDML trains a deep neural network which learns a set of hierarchical nonlinear transformations to project face pairs into the same feature subspace, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively, so that discriminative information can be exploited in the deep network. Our method achieves very competitive face verification performance on the widely used LFW and YouTube Faces (YTF) datasets. © 2014 IEEE.

Lu J.,Advanced Digital science Center | Tan Y.-P.,Nanyang Technological University
IEEE Transactions on Information Forensics and Security | Year: 2013

Conventional subspace-based face recognition methods seek low-dimensional feature subspaces to achieve high classification accuracy and assume the same loss from different types of misclassification. This assumption, however, may not hold in many practical face recognition systems as different types of misclassification could lead to different losses. Motivated by this concern, this paper proposes a cost-sensitive subspace analysis approach for face recognition. Our approach uses a cost matrix specifying different costs corresponding to different types of misclassifications, into two popular and widely used discriminative subspace analysis methods and devises the cost-sensitive linear discriminant analysis (CSLDA) and cost-sensitive marginal fisher analysis (CSMFA) methods, to achieve a minimum overall recognition loss by performing recognition in these learned low-dimensional subspaces. To better exploit the complementary information from multiple features for improved face recognition, we further propose a multiview cost-sensitive subspace analysis approach by seeking a common feature subspace to fuse multiple face features to improve the recognition performance. Extensive experimental results demonstrate the effectiveness of our proposed methods. © 2005-2012 IEEE.

Lu J.,Advanced Digital science Center | Yang H.,Advanced Digital science Center | Min D.,Advanced Digital science Center | Do M.N.,University of Illinois at Urbana - Champaign
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2013

Though many tasks in computer vision can be formulated elegantly as pixel-labeling problems, a typical challenge discouraging such a discrete formulation is often due to computational efficiency. Recent studies on fast cost volume filtering based on efficient edge-aware filters have provided a fast alternative to solve discrete labeling problems, with the complexity independent of the support window size. However, these methods still have to step through the entire cost volume exhaustively, which makes the solution speed scale linearly with the label space size. When the label space is huge, which is often the case for (sub pixel-accurate) stereo and optical flow estimation, their computational complexity becomes quickly unacceptable. Developed to search approximate nearest neighbors rapidly, the Patch Match method can significantly reduce the complexity dependency on the search space size. But, its pixel-wise randomized search and fragmented data access within the 3D cost volume seriously hinder the application of efficient cost slice filtering. This paper presents a generic and fast computational framework for general multi-labeling problems called Patch Match Filter (PMF). For the very first time, we explore effective and efficient strategies to weave together these two fundamental techniques developed in isolation, i.e., Based-based randomized search and efficient edge-aware image filtering. By decompositing an image into compact super pixels, we also propose super pixel-based novel search strategies that generalize and improve the original Patch Match method. Focusing on dense correspondence field estimation in this paper, we demonstrate PMF's applications in stereo and optical flow. Our PMF methods achieve state-of-the-art correspondence accuracy but run much faster than other competing methods, often giving over 10-times speedup for large label space cases. © 2013 IEEE.

Zhang T.,Advanced Digital science Center | Ghanem B.,King Abdullah University of Science and Technology | Liu S.,National University of Singapore | Ahuja N.,University of Illinois at Urbana - Champaign
International Journal of Computer Vision | Year: 2013

In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in Multi-Task Tracking (MTT). By employing popular sparsity-inducing lp,q mixed norms (specifically p∈2,∞ and q=1), we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking performance and overall computational complexity. Interestingly, we show that the popular L1 tracker (Mei and Ling, IEEE Trans Pattern Anal Mach Intel 33(11):2259-2272, 2011) is a special case of our MTT formulation (denoted as the L11 tracker) when p=q=1. Under the MTT framework, some of the tasks (particle representations) are often more closely related and more likely to share common relevant covariates than other tasks. Therefore, we extend the MTT framework to take into account pairwise structural correlations between particles (e.g. spatial smoothness of representation) and denote the novel framework as S-MTT. The problem of learning the regularized sparse representation in MTT and S-MTT can be solved efficiently using an Accelerated Proximal Gradient (APG) method that yields a sequence of closed form updates. As such, S-MTT and MTT are computationally attractive. We test our proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that S-MTT is much better than MTT, and both methods consistently outperform state-of-the-art trackers. © 2012 Springer Science+Business Media New York.

Wang M.,National University of Singapore | Wang M.,Hefei University of Technology | Ni B.,Advanced Digital Science Center | Hua X.-S.,Microsoft | Chua T.-S.,National University of Singapore
ACM Computing Surveys | Year: 2012

Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information retrieval. However, despite the great progress achieved in the past two decades, automatic tagging technologies still can hardly achieve satisfactory performance on real-world multimedia data that vary widely in genre, quality, and content. Meanwhile, the power of human intelligence has been fully demonstrated in the Web 2.0 era. If well motivated, Internet users are able to tag a large amount of multimedia data. Therefore, a set of new techniques has been developed by combining humans and computers for more accurate and efficient multimedia tagging, such as batch tagging, active tagging, tag recommendation, and tag refinement. These techniques are able to accomplish multimedia tagging by jointly exploring humans and computers in different ways. This article refers to them collectively as assistive tagging and conducts a comprehensive survey of existing research efforts on this theme. We first introduce the status of automatic tagging and manual tagging and then state why assistive tagging can be a good solution. We categorize existing assistive tagging techniques into three paradigms: (1) tagging with data selection&organization; (2) tag recommendation; and (3) tag processing. We introduce the research efforts on each paradigm and summarize the methodologies. We also provide a discussion on several future trends in this research direction. © 2012 ACM.

Min D.,Advanced Digital science Center | Lu J.,Advanced Digital science Center | Do M.N.,University of Illinois at Urbana - Champaign
IEEE Transactions on Image Processing | Year: 2012

This paper presents a novel approach for depth video enhancement. Given a high-resolution color video and its corresponding low-quality depth video, we improve the quality of the depth video by increasing its resolution and suppressing noise. For that, a weighted mode filtering method is proposed based on a joint histogram. When the histogram is generated, the weight based on color similarity between reference and neighboring pixels on the color image is computed and then used for counting each bin on the joint histogram of the depth map. A final solution is determined by seeking a global mode on the histogram. We show that the proposed method provides the optimal solution with respect to L 1 norm minimization. For temporally consistent estimate on depth video, we extend this method into temporally neighboring frames. Simple optical flow estimation and patch similarity measure are used for obtaining the high-quality depth video in an efficient manner. Experimental results show that the proposed method has outstanding performance and is very efficient, compared with existing methods. We also show that the temporally consistent enhancement of depth video addresses a flickering problem and improves the accuracy of depth video. © 2011 IEEE.

Min D.,Advanced Digital science Center | Lu J.,Advanced Digital science Center | Do M.N.,University of Illinois at Urbana - Champaign
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2013

This paper presents a novel method for performing efficient cost aggregation in stereo matching. The cost aggregation problem is reformulated from the perspective of a histogram, giving us the potential to reduce the complexity of the cost aggregation in stereo matching significantly. Differently from previous methods which have tried to reduce the complexity in terms of the size of an image and a matching window, our approach focuses on reducing the computational redundancy that exists among the search range, caused by a repeated filtering for all the hypotheses. Moreover, we also reduce the complexity of the window-based filtering through an efficient sampling scheme inside the matching window. The tradeoff between accuracy and complexity is extensively investigated by varying the parameters used in the proposed method. Experimental results show that the proposed method provides high-quality disparity maps with low complexity and outperforms existing local methods. This paper also provides new insights into complexity-constrained stereo-matching algorithm design. © 1979-2012 IEEE.

Lu J.,Advanced Digital science Center | Tan Y.-P.,Nanyang Technological University | Wang G.,Nanyang Technological University
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2013

Conventional appearance-based face recognition methods usually assume that there are multiple samples per person (MSPP) available for discriminative feature extraction during the training phase. In many practical face recognition applications such as law enhancement, e-passport, and ID card identification, this assumption, however, may not hold as there is only a single sample per person (SSPP) enrolled or recorded in these systems. Many popular face recognition methods fail to work well in this scenario because there are not enough samples for discriminant learning. To address this problem, we propose in this paper a novel discriminative multimanifold analysis (DMMA) method by learning discriminative features from image patches. First, we partition each enrolled face image into several nonoverlapping patches to form an image set for each sample per person. Then, we formulate the SSPP face recognition as a manifold-manifold matching problem and learn multiple DMMA feature spaces to maximize the manifold margins of different persons. Finally, we present a reconstruction-based manifold-manifold distance to identify the unlabeled subjects. Experimental results on three widely used face databases are presented to demonstrate the efficacy of the proposed approach. © 1979-2012 IEEE.

Loading Advanced Digital science Center collaborators
Loading Advanced Digital science Center collaborators