Time filter

Source Type

Singapore, Singapore

Wang G.,Nanyang Technological University | Wang G.,Advanced Digital science Center | Hoiem D.,University of Illinois at Urbana - Champaign | Forsyth D.,University of Illinois at Urbana - Champaign
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2012

Measuring image similarity is a central topic in computer vision. In this paper, we propose to measure image similarity by learning from the online Flickr image groups. We do so by: Choosing 103 Flickr groups, building a one-versus-all multiclass classifier to classify test images into a group, taking the set of responses of the classifiers as features, calculating the distance between feature vectors to measure image similarity. Experimental results on the Corel dataset and the PASCAL VOC 2007 dataset show that our approach performs better on image matching, retrieval, and classification than using conventional visual features. To build our similarity measure, we need one-versus-all classifiers that are accurate and can be trained quickly on very large quantities of data. We adopt an SVM classifier with a histogram intersection kernel. We describe a novel fast training algorithm for this classifier: the Stochastic Intersection Kernel MAchine (SIKMA) training algorithm. This method can produce a kernel classifier that is more accurate than a linear classifier on tens of thousands of examples in minutes. © 2012 IEEE.

Deng W.,Beijing University of Posts and Telecommunications | Hu J.,Beijing University of Posts and Telecommunications | Lu J.,Advanced Digital science Center | Guo J.,Beijing University of Posts and Telecommunications
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2014

We develop a transform-invariant PCA (TIPCA) technique which aims to accurately characterize the intrinsic structures of the human face that are invariant to the in-plane transformations of the training images. Specially, TIPCA alternately aligns the image ensemble and creates the optimal eigenspace, with the objective to minimize the mean square error between the aligned images and their reconstructions. The learning from the FERET facial image ensemble of 1,196 subjects validates the mutual promotion between image alignment and eigenspace representation, which eventually leads to the optimized coding and recognition performance that surpasses the handcrafted alignment based on facial landmarks. Experimental results also suggest that state-of-the-art invariant descriptors, such as local binary pattern (LBP), histogram of oriented gradient (HOG), and Gabor energy filter (GEF), and classification methods, such as sparse representation based classification (SRC) and support vector machine (SVM), can benefit from using the TIPCA-aligned faces, instead of the manually eye-aligned faces that are widely regarded as the ground-truth alignment. Favorable accuracies against the state-of-the-art results on face coding and face recognition are reported. © 2013 IEEE.

Lu J.,Advanced Digital science Center | Zhou X.,Capital Normal University | Tan Y.-P.,Nanyang Technological University | Shang Y.,Capital Normal University | Zhou J.,Tsinghua University
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2014

Kinship verification from facial images is an interesting and challenging problem in computer vision, and there are very limited attempts on tackle this problem in the literature. In this paper, we propose a new neighborhood repulsed metric learning (NRML) method for kinship verification. Motivated by the fact that interclass samples (without a kinship relation) with higher similarity usually lie in a neighborhood and are more easily misclassified than those with lower similarity, we aim to learn a distance metric under which the intraclass samples (with a kinship relation) are pulled as close as possible and interclass samples lying in a neighborhood are repulsed and pushed away as far as possible, simultaneously, such that more discriminative information can be exploited for verification. To make better use of multiple feature descriptors to extract complementary information, we further propose a multiview NRML (MNRML) method to seek a common distance metric to perform multiple feature fusion to improve the kinship verification performance. Experimental results are presented to demonstrate the efficacy of our proposed methods. Finally, we also test human ability in kinship verification from facial images and our experimental results show that our methods are comparable to that of human observers. © 1979-2012 IEEE.

Wang M.,National University of Singapore | Wang M.,Hefei University of Technology | Ni B.,Advanced Digital science Center | Hua X.-S.,Microsoft | Chua T.-S.,National University of Singapore
ACM Computing Surveys | Year: 2012

Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information retrieval. However, despite the great progress achieved in the past two decades, automatic tagging technologies still can hardly achieve satisfactory performance on real-world multimedia data that vary widely in genre, quality, and content. Meanwhile, the power of human intelligence has been fully demonstrated in the Web 2.0 era. If well motivated, Internet users are able to tag a large amount of multimedia data. Therefore, a set of new techniques has been developed by combining humans and computers for more accurate and efficient multimedia tagging, such as batch tagging, active tagging, tag recommendation, and tag refinement. These techniques are able to accomplish multimedia tagging by jointly exploring humans and computers in different ways. This article refers to them collectively as assistive tagging and conducts a comprehensive survey of existing research efforts on this theme. We first introduce the status of automatic tagging and manual tagging and then state why assistive tagging can be a good solution. We categorize existing assistive tagging techniques into three paradigms: (1) tagging with data selection&organization; (2) tag recommendation; and (3) tag processing. We introduce the research efforts on each paradigm and summarize the methodologies. We also provide a discussion on several future trends in this research direction. © 2012 ACM.

Zhang T.,Advanced Digital science Center | Ghanem B.,King Abdullah University of Science and Technology | Liu S.,National University of Singapore | Ahuja N.,University of Illinois at Urbana - Champaign
International Journal of Computer Vision | Year: 2013

In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in Multi-Task Tracking (MTT). By employing popular sparsity-inducing lp,q mixed norms (specifically p∈2,∞ and q=1), we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking performance and overall computational complexity. Interestingly, we show that the popular L1 tracker (Mei and Ling, IEEE Trans Pattern Anal Mach Intel 33(11):2259-2272, 2011) is a special case of our MTT formulation (denoted as the L11 tracker) when p=q=1. Under the MTT framework, some of the tasks (particle representations) are often more closely related and more likely to share common relevant covariates than other tasks. Therefore, we extend the MTT framework to take into account pairwise structural correlations between particles (e.g. spatial smoothness of representation) and denote the novel framework as S-MTT. The problem of learning the regularized sparse representation in MTT and S-MTT can be solved efficiently using an Accelerated Proximal Gradient (APG) method that yields a sequence of closed form updates. As such, S-MTT and MTT are computationally attractive. We test our proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that S-MTT is much better than MTT, and both methods consistently outperform state-of-the-art trackers. © 2012 Springer Science+Business Media New York.

Discover hidden collaborations