Time filter

Source Type

Xie X.,Sun Yat Sen University | Xie X.,Guangdong Key Laboratory of Information Security Technology | Yang L.,Hong Kong Polytechnic University | Yang L.,National University of Defense Technology | And 2 more authors.
Computer Vision and Image Understanding | Year: 2016

A real-world object surface often consists of multiple materials. Recognizing surface materials is important because it significantly benefits understanding the quality and functionality of the object. However, identifying multiple materials on a surface from a single photograph is very challenging because different materials are often interweaved together and hard to be segmented for separate identification. To address this problem, we present a multi-label learning framework for identifying multiple materials of a real-world object surface without a segmentation for each of them. We find that there are potential correlations between materials and that correlations are relevant to object category. For example, a surface of monitor likely consists of plastic and glasses rather than wood or stone. It motivates us to learn the correlations of material labels locally on each semantic object cluster. To this end, samples are semantically grouped according to their object categories. For each group of samples, we employ a Directed Acyclic Graph (DAG) to encode the conditional dependencies of material labels. These object-specific DAGs are then used for assisting the inference of surface materials. The key enabler of the proposed method is that the object recognition provides a semantic cue for material recognition by formulating an object-specific DAG learning. We test our method on the ALOT database and show consistent improvements over the state-of-the-arts. © 2016 Elsevier Inc. All rights reserved.

Chen Y.-C.,Sun Yat Sen University | Zheng W.-S.,Sun Yat Sen University | Zheng W.-S.,Guangdong Provincial Key Laboratory of Computational Science | Lai J.,Sun Yat Sen University | Lai J.,Guandong Key Laboratory of Information Security Technology
IJCAI International Joint Conference on Artificial Intelligence | Year: 2015

Person re-identification concerns the matching of pedestrians across disjoint camera views. Due to the changes of viewpoints, lighting conditions and camera features, images of the same person from different views always appear differently, and thus feature representations across disjoint camera views of the same person follow different distributions. In this work, we propose an effective, low cost and easy-to-apply schema called the Mirror Representation, which embeds the view-specific feature transformation and enables alignment of the feature distributions across disjoint views for the same person. The proposed Mirror Representation is also designed to explicitly model the relation between different view-specific transformations and meanwhile control their discrepancy. With our Mirror Representation, we can enhance existing sub-space/metric learning models significantly, and we particularly show that kernel marginal fisher analysis significantly outperforms the current state-of-the-art methods through extensive experiments on VIPeR, PRID450S and CUHK01.

Wu B.,Sun Yat Sen University | Wu B.,China University of Technology | Wu B.,National University of Defense Technology | Yang Q.,Sun Yat Sen University | And 4 more authors.
IJCAI International Joint Conference on Artificial Intelligence | Year: 2015

Cross-modal hashing is designed to facilitate fast search across domains. In this work, we present a cross-modal hashing approach, called quantized correlation hashing (QCH), which takes into consideration the quantization loss over domains and the relation between domains. Unlike previous approaches that separate the optimization of the quantizer independent of maximization of domain correlation, our approach simultaneously optimizes both processes. The underlying relation between the domains that describes the same objects is established via maximizing the correlation between the hash codes across the domains. The resulting multi-modal objective function is transformed to a unimodal formalization, which is optimized through an alternative procedure. Experimental results on three real world datasets demonstrate that our approach outperforms the state-of-the-art multi-modal hashing methods.

You J.,Sun Yat Sen University | You J.,Guangdong Provincial Key Laboratory of Computational Science | Wu A.,Sun Yat Sen University | Wu A.,Guangdong Provincial Key Laboratory of Computational Science | And 4 more authors.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2016

Most existing person re-identification (re-id) models focus on matching still person images across disjoint camera views. Since only limited information can be exploited from still images, it is hard (if not impossible) to overcome the occlusion, pose and camera-view change, and lighting variation problems. In comparison, video-based re-id methods can utilize extra space-time information, which contains much more rich cues for matching to overcome the mentioned problems. However, we find that when using video-based representation, some inter-class difference can be much more obscure than the one when using still-imagebased representation, because different people could not only have similar appearance but also have similar motions and actions which are hard to align. To solve this problem, we propose a top-push distance learning model (TDL), in which we integrate a top-push constrain for matching video features of persons. The top-push constraint enforces the optimization on top-rank matching in re-id, so as to make the matching model more effective towards selecting more discriminative features to distinguish different persons. Our experiments show that the proposed video-based reid framework outperforms the state-of-the-art video-based re-id methods.

Chang X.,Sun Yat Sen University | Chang X.,SYSU CMU Shunde International Joint Research Institute | Zheng W.-S.,Sun Yat Sen University | Zheng W.-S.,Guangdong Provincial Key Laboratory of Computational Science | Zhang J.,University of Dundee
IEEE Transactions on Image Processing | Year: 2015

Collective activity is a collection of atomic activities (individual person's activity) and can hardly be distinguished by an atomic activity in isolation. The interactions among people are important cues for recognizing collective activity. In this paper, we concentrate on modeling the person-person interactions for collective activity recognition. Rather than relying on hand-craft description of the person-person interaction, we propose a novel learning-based approach that is capable of computing the class-specific person-person interaction patterns. In particular, we model each class of collective activity by an interaction matrix, which is designed to measure the connection between any pair of atomic activities in a collective activity instance. We then formulate an interaction response (IR) model by assembling all these measurements and make the IR class specific and distinct from each other. A multitask IR is further proposed to jointly learn different person-person interaction patterns simultaneously in order to learn the relation between different person-person interactions and keep more distinct activity-specific factor for each interaction at the same time. Our model is able to exploit discriminative low-rank representation of person-person interaction. Experimental results on two challenging data sets demonstrate our proposed model is comparable with the state-of-the-art models and show that learning person-person interactions plays a critical role in collective activity recognition. © 1992-2012 IEEE.

Zhou Z.,Sun Yat Sen University | Zhou Z.,National University of Defense Technology | Zheng W.-S.,Sun Yat Sen University | Zheng W.-S.,Guangdong Provincial Key Laboratory of Computational Science | And 3 more authors.
Pattern Recognition | Year: 2016

Online learning is very important for processing sequential data and helps alleviate the computation burden on large scale data as well. Especially, one-pass online learning is to predict a new coming sample's label and update the model based on the prediction, where each coming sample is used only once and never stored. So far, existing one-pass online learning methods are globally modeled and do not take the local structure of the data distribution into consideration, which is a significant factor of handling the nonlinear data separation case. In this work, we propose a local online learning (LOL) method, a multiple hyperplane Passive Aggressive algorithm integrated with online clustering, so that all local hyperplanes are learned jointly and working cooperatively. This is achieved by formulating a common component as information traffic among multiple hyperplanes in LOL. A joint optimization algorithm is proposed and theoretical analysis on the cumulative error is also provided. Extensive experiments on 11 datasets show that LOL can learn a nonlinear decision boundary, overall achieving notably better performance without using any kernel modeling and second order modeling. © 2015 Elsevier Ltd.

Loading Guangdong Provincial Key Laboratory of Computational Science collaborators
Loading Guangdong Provincial Key Laboratory of Computational Science collaborators