Time filter

Source Type

Ma B.,University of China Academy Science | Huang R.,Huazhong University of Science and Technology | Qin L.,CAS Institute of Computing Technology
Neurocomputing | Year: 2015

Building on the recent advances in the Fisher kernel framework for image classification, this paper proposes a novel image representation for head yaw estimation. Specifically, for each pixel of the image, a concise 9-dimensional local descriptor is computed consisting of the pixel coordinates, intensity, the first and second order derivatives, as well as the magnitude and orientation of the gradient. These local descriptors are encoded by Fisher vectors before being pooled to produce a global representation of the image. The proposed image representation is effective to head yaw estimation, and can be further improved by metric learning. A series of head yaw estimation experiments have been conducted on five datasets, and the results show that the new image representation improves the current state-of-the-art for head yaw estimation. © 2014 Elsevier B.V.

Ma B.,University of China Academy Science | Su Y.,University of Caen Lower Normandy | Jurie F.,University of Caen Lower Normandy
Image and Vision Computing | Year: 2014

Avoiding the use of complicated pre-processing steps such as accurate face and body part segmentation or image normalization, this paper proposes a novel face/person image representation which can properly handle background and illumination variations. Denoted as gBiCov, this representation relies on the combination of Biologically Inspired Features (BIF) and Covariance descriptors [1]. More precisely, gBiCov is obtained by computing and encoding the difference between BIF features at different scales. The distance between two persons can then be efficiently measured by computing the Euclidean distance of their signatures, avoiding some time consuming operations in Riemannian manifold required by the use of Covariance descriptors. In addition, the recently proposed KISSME framework [2] is adopted to learn a metric adapted to the representation. To show the effectiveness of gBiCov, experiments are conducted on three person re-identification tasks (VIPeR, i-LIDS and ETHZ) and one face verification task (LFW), on which competitive results are obtained. As an example, the matching rate at rank 1 on the VIPeR dataset is of 31.11%, improving the best previously published result by more than 10. © 2014 Elsevier B.V.

Xu Y.,Sun Yat Sen University | Huang R.,Huazhong University of Science and Technology | Ma B.,University of China Academy Science | Lin L.,Sun Yat Sen University
MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia | Year: 2014

This paper presents a novel framework for a multimedia search task: searching a person in a scene using human body appearance. Existing works mostly focus on two independent problems related to this task, i.e., people detection and person re-identification. However, a sequential combination of these two components does not solve the person search problem seamlessly for two reasons: 1) the errors in people detection are carried into person re-identification unavoidably; 2) the setting of person re-identification is different from that of person search which is essentially a verification problem. To bridge this gap, we propose a unified framework which jointly models the commonness of people (for detection) and the uniqueness of a person (for identification). We demonstrate superior performance of our approach on public benchmarks compared with the sequential combination of the state-of-the-art detection and identification algorithms.

Cui Z.,CAS Institute of Computing Technology | Cui Z.,Huaqiao University | Cui Z.,University of China Academy Science | Chang H.,CAS Institute of Computing Technology | And 3 more authors.
Neurocomputing | Year: 2014

Video-based Face Recognition (VFR) can be converted into the problem of measuring the similarity of two image sets, where the examples from a video clip construct one image set. In this paper, we consider face images from each clip as an ensemble and formulate VFR into the Joint Sparse Representation (JSR) problem. In JSR, to adaptively learn the sparse representation of a probe clip, we simultaneously consider the class-level and atom-level sparsity, where the former structurizes the enrolled clips using the structured sparse regularizer (i.e., L2,1-norm) and the latter seeks for a few related examples using the sparse regularizer (i.e., L1-norm). Besides, we also consider to pre-train a compacted dictionary to accelerate the algorithm, and impose the non-negativity constraint on the recovered coefficients to encourage positive correlations of the representation. The classification is ruled in favor of the class that has the lowest accumulated reconstruction error. We conduct extensive experiments on three real-world databases: Honda, MoBo and YouTube Celebrities (YTC). The results demonstrate that our method is more competitive than those state-of-the-art VFR methods. © 2013 Elsevier B.V.

Discover hidden collaborations