Time filter

Source Type

Yuan C.,National Laboratory of Pattern Recognition | Hu W.,National Laboratory of Pattern Recognition | Tian G.,National Laboratory of Pattern Recognition | Yang S.,National Laboratory of Pattern Recognition | Wang H.,Nanjing Southeast University
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2013

In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple features from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTSL. Since the tasks are generated from multiple features associated with the same visual input, they are not independent but inter-related. We introduce a Beta process(BP) prior to the hierarchical MTSL model, which efficiently learns a compact dictionary and infers the sparse structure shared across all the tasks. The MTSL model enforces the robustness in coefficient estimation compared with performing each task independently. Besides, the sparseness is achieved via the Beta process formulation rather than the computationally expensive L1 norm penalty. In terms of non-informative gamma hyper-priors, the sparsity level is totally decided by the data. Finally, the learning problem is solved by Gibbs sampling inference which estimates the full posterior on the model parameters. Experimental results on the KTH and UCF sports datasets demonstrate the effectiveness of the proposed MTSL approach for action recognition. © 2013 IEEE.

Niu B.,National Laboratory of Pattern Recognition | Cheng J.,National Laboratory of Pattern Recognition | Bai X.,Beihang University | Lu H.,National Laboratory of Pattern Recognition
Signal Processing | Year: 2013

Relevance feedback is an effective approach to improve the performance of image retrieval by leveraging the labeling of human. In order to alleviate the burden of labeling, active learning method has been introduced to select the most informative samples for labeling. In this paper, we present a novel batch mode active learning scheme for informative sample selection. Inspired by the method of graph propagation, we not only take the correlation between labeled samples and unlabeled samples, but the correlation among unlabeled samples taken into account as well. Especially, considering the unbalanced distribution of samples and the personalized feedback of human we propose an asymmetric propagation scheme to unify the various criteria including uncertainty, diversity and density into batch mode active learning in relevance feedback. Extensive experiments on publicly available datasets show that the proposed method is promising. © 2012 Elsevier B.V.

Wu B.,National Laboratory of Pattern Recognition | Lyu S.,University at Albany | Hu B.-G.,National Laboratory of Pattern Recognition | Ji Q.,Rensselaer Polytechnic Institute
Proceedings of the IEEE International Conference on Computer Vision | Year: 2013

We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face track lets) in videos. The rationale of our method is that face track let clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and track let linking associations. We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and track let associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets. © 2013 IEEE.

Zhou G.,Central China Normal University | He T.,Central China Normal University | Zhao J.,National Laboratory of Pattern Recognition | Hu P.,National Laboratory of Pattern Recognition
ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference | Year: 2015

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings about new challenge for question retrieval in cQA. In this paper, we propose to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel to aggregated them into the fixedlength vectors. Experimental results on large-scale real world cQA data set show that our approach can significantly outperform state-of-The-Art translation models and topic-based models for question retrieval in cQA. © 2015 Association for Computational Linguistics.

Zhang J.,National Laboratory of Pattern Recognition | Wang H.,Shandong University | Davoine F.,French National Center for Scientific Research | Pan C.,National Laboratory of Pattern Recognition
Proceedings - International Conference on Pattern Recognition | Year: 2012

A simple and efficient skin detector facilitates automatic and robust human detection and tracking. In this paper, we propose a new skin detection method via linear regression tree, which decomposes the problem of discriminating different skin and nonskin colors into several simple problems. Experimental results on the MCG skin database demonstrated its better generalization ability and discriminability than state-of-the-arts. © 2012 ICPR Org Committee.

Zhang J.,National Laboratory of Pattern Recognition | Zhang D.,Toshiba Corporation | Hao J.,Toshiba Corporation
IJCAI International Joint Conference on Artificial Intelligence | Year: 2015

Statistical machine translation models have made great progress in improving the translation quality. However, the existing models predict the target translation with only the source- and target-side local context information. In practice, distinguishing good translations from bad ones does not only depend on the local features, but also rely on the global sentence-level information. In this paper, we explore the source-side global sentence-level features for target-side local translation prediction. We propose a novel bilingually-constrained chunk-based convolutional neural network to learn sentence semantic representations. With the sentencelevel feature representation, we further design a feed-forward neural network to better predict translations using both local and global information. The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model.

Liu J.,National Laboratory of Pattern Recognition | Liu B.,National Laboratory of Pattern Recognition | Lu H.,National Laboratory of Pattern Recognition
Pattern Recognition | Year: 2015

Abstract Deep learning models have gained significant interest as a way of building hierarchical image representation. However, current models still perform far behind human vision system because of the lack of selective property, the lack of high-level guidance for learning and the weakness to learn from few examples. To address these problems, we propose a detection-guided hierarchical learning algorithm for image representation. First, we train a multi-layer deconvolutional network in an unsupervised bottom-up scheme. During the training process, we use each raw image as an input, and decompose an image using multiple alternating layers of non-negative convolutional sparse coding and max-pooling. Inspired from the observation that the filters in top layer can be selectively activated by different high-level structures of images, i.e., one or partial filters should correspond to a particular object class, we update the filters in network by minimizing the reconstruction errors of the corresponding feature maps with respect to certain object detection maps obtained by a set of pre-trained detectors. With the fine-tuned network, we can extract the features of given images in a purely unsupervised way with no need of detectors. We evaluate the proposed feature representation on the task of object recognition, for which an SVM classifier with spatial pyramid matching kernel is used. Experiments on the datasets of PASCAL VOC 2007, Caltech-101 and Caltech-256 demonstrate that our approach outperforms some recent hierarchical feature descriptors as well as classical hand-crafted features. © 2015 Elsevier Ltd. All rights reserved.

Liu B.,National Laboratory of Pattern Recognition | Liu J.,National Laboratory of Pattern Recognition | Lu H.,National Laboratory of Pattern Recognition
Neurocomputing | Year: 2014

Spatial Pyramid Matching is a successful extension of bag-of-feature model to embed spatial information of local features, in which the image is divided into a sequence of increasingly finer girds, and the grids are taken as uniform spatial partitions in ad-hoc manner without any theoretical motivation. Obviously, the uniform spatial partition cannot adapt to different spatial distribution across image categories. To this end, we propose a data-driven approach to adaptively learn the discriminative spatial partitions corresponding to each class, and explore them for image classification. First, a set of over-complete spatial partitions covering kinds of spatial distribution of local features are created in a flexible manner, and we concatenate the feature representations of each partitioned region. Then we adopt a discriminative learning formulation with the group sparse constraint to find a sparse mapping from the feature representation to the label space. To further enhance the robustness of the model, we compress the feature representation by removing the dimensions corresponding to those unimportant partitioned regions, and explore the compressed representation to generate a multi-region matching kernel prepared to train a one-versus-others SVM classifier. The experiments on three object datasets (i.e. Caltech-101, Caltech-256, Pascal VOC 2007), and one scene dataset (i.e. 15-Scenes) demonstrate the effectiveness of our proposed method. © 2014 Elsevier B.V.

Liu B.,National Laboratory of Pattern Recognition | Liu J.,National Laboratory of Pattern Recognition | Lu H.,National Laboratory of Pattern Recognition
Computer Vision and Image Understanding | Year: 2015

How to build a suitable image representation remains a critical problem in computer vision. Traditional Bag-of-Feature (BoF) based models build image representation by the pipeline of local feature extraction, feature coding and spatial pooling. However, three major shortcomings hinder the performance, i.e., the limitation of hand-designed features, the discrimination loss in local appearance coding and the lack of spatial information. To overcome the above limitations, in this paper, we propose a generalized BoF-based framework, which is hierarchically learned by exploring recently developed deep learning methods. First, with raw images as input, we densely extract local patches and learn local features by stacked Independent Subspace Analysis network. The learned features are then transformed to appearance codes by sparse Restricted Boltzmann Machines. Second, we perform spatial max-pooling on a set of over-complete spatial regions, which is generated by covering various spatial distributions, to incorporate more flexible spatial information. Third, a structured sparse Auto-encoder is proposed to explore the region representations into the image-level signature. To learn the proposed hierarchy, we layerwise pre-train the network in unsupervised manner, followed by supervised fine-tuning with image labels. Extensive experiments on different benchmarks, i.e., UIUC-Sports, Caltech-101, Caltech-256, Scene-15 and MIT Indoor-67, demonstrate the effectiveness of our proposed model. © 2015 Elsevier Inc. All rights reserved.

Li H.,National Laboratory of Pattern Recognition | Zhang J.,National Laboratory of Pattern Recognition | Zong C.,National Laboratory of Pattern Recognition
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015

Discourse relations between two consecutive segments play an important role in many natural language processing (NLP) tasks. However, a large portion of the discourse relations are implicit and difficult to detect due to the absence of connectives. Traditional detection approaches utilize discrete features, such as words, clusters and syntactic production rules, which not only depend strongly on the linguistic resources, but also lead to severe data sparseness. In this paper, we instead propose a novel method to predict the implicit discourse relations based on the purely distributed representations of words, sentences and syntactic features. Furthermore, we learn distributed representations for different kinds of features. The experiments show that our proposed method can achieve the best performance in most cases on the standard data sets. © Springer International Publishing Switzerland 2015.

Loading National Laboratory of Pattern Recognition collaborators
Loading National Laboratory of Pattern Recognition collaborators