Time filter

Source Type

Australian, Australia

Shen F.,University of Electronic Science and Technology of China | Shen C.,University of Adelaide | Shen C.,Australian Center for Robotic Vision | Liu W.,IBM | Shen H.T.,University of Queensland
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2015

Recently, learning based hashing techniques have attracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a major difficulty of learning to hash lies in handling the discrete constraints imposed on the pursued hash codes, which typically makes hash optimizations very challenging (NP-hard in general). In this work, we propose a new supervised hashing framework, where the learning objective is to generate the optimal binary hash codes for linear classification. By introducing an auxiliary variable, we reformulate the objective such that it can be solved substantially efficiently by employing a regularization algorithm. One of the key steps in this algorithm is to solve a regularization sub-problem associated with the NP-hard binary optimization. We show that the sub-problem admits an analytical solution via cyclic coordinate descent. As such, a high-quality discrete solution can eventually be obtained in an efficient computing manner, therefore enabling to tackle massive datasets. We evaluate the proposed approach, dubbed Supervised Discrete Hashing (SDH), on four large image datasets and demonstrate its superiority to the state-of-the-art hashing methods in large-scale image retrieval. © 2015 IEEE. Source

Li Y.,University of Adelaide | Li Y.,NICTA | Liu L.,University of Adelaide | Shen C.,University of Adelaide | And 3 more authors.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2015

Mid-level visual element discovery aims to find clusters of image patches that are both representative and discriminative. In this work, we study this problem from the prospective of pattern mining while relying on the recently popularized Convolutional Neural Networks (CNNs). Specifically, we find that for an image patch, activation extracted from the first fully-connected layer of a CNN have two appealing properties which enable its seamless integration with pattern mining. Patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining. When we retrieve and visualize image patches with the same pattern (See Fig. 1), surprisingly, they are not only visually similar but also semantically consistent. We apply our approach to scene and object classification tasks, and demonstrate that our approach outperforms all previous works on mid-level visual element discovery by a sizeable margin with far fewer elements being used. Our approach also outperforms or matches recent works using CNN for these tasks. Source code of the complete system is available online. © 2015 IEEE. Source

Zhang C.,Beijing Institute of Technology | Zhang C.,University of Adelaide | Shen C.,University of Adelaide | Shen C.,Australian Center for Robotic Vision | Shen T.,Beijing Institute of Technology
International Journal of Computer Vision | Year: 2015

We propose a fast, accurate matching method for estimating dense pixel correspondences across scenes. It is a challenging problem to estimate dense pixel correspondences between images depicting different scenes or instances of the same object category. While most such matching methods rely on hand-crafted features such as SIFT, we learn features from a large amount of unlabeled image patches using unsupervised learning. Pixel-layer features are obtained by encoding over the dictionary, followed by spatial pooling to obtain patch-layer features. The learned features are then seamlessly embedded into a multi-layer matching framework. We experimentally demonstrate that the learned features, together with our matching model, outperform state-of-the-art methods such as the SIFT flow (Liu et al. in IEEE Trans Pattern Anal Mach Intell 33(5):978–994, 2011), coherency sensitive hashing (Korman and Avidan in: Proceedings of the IEEE international conference on computer vision (ICCV), 2011) and the recent deformable spatial pyramid matching (Kim et al. in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013) methods both in terms of accuracy and computation efficiency. Furthermore, we evaluate the performance of a few different dictionary learning and feature encoding methods in the proposed pixel correspondence estimation framework, and analyze the impact of dictionary learning and feature encoding with respect to the final matching performance. © 2015 Springer Science+Business Media New York Source

Shen F.,University of Electronic Science and Technology of China | Shen C.,University of Adelaide | Shen C.,Australian Center for Robotic Vision | Shi Q.,University of Adelaide | And 4 more authors.
IEEE Transactions on Image Processing | Year: 2015

Learning-based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes preserving the Euclidean similarity in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original high-dimensional data. The complexities of these models, and the problems with out-of-sample data, have previously rendered them unsuitable for application to large-scale embedding, however. In this paper, how to learn compact binary embeddings on their intrinsic manifolds is considered. In order to address the above-mentioned difficulties, an efficient, inductive solution to the out-of-sample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method are proposed. The proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. It is particularly shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths. It is shown that the proposed framework can be further improved, for example, by minimizing the quantization error with learned orthogonal rotations without much computation overhead. In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance. © 1992-2012 IEEE. Source

Liu L.,University of Adelaide | Wang L.,University of Wollongong | Shen C.,University of Adelaide | Shen C.,Australian Center for Robotic Vision
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2016

Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a number of works have taken the approach of hierarchically merging visual words of an initial large-sized codebook, but implemented this approach with different merging criteria. In this work, we propose a single probabilistic framework to unify these merging criteria, by identifying two key factors: the function used to model the class-conditional distribution and the method used to estimate the distribution parameters. More importantly, by adopting new distribution functions and/or parameter estimation methods, our framework can readily produce a spectrum of novel merging criteria. Three of them are specifically discussed in this paper. For the first criterion, we adopt the multinomial distribution with the Bayesian method; For the second criterion, we integrate the Gaussian distribution with maximum likelihood parameter estimation. For the third criterion, which shows the best merging performance, we propose a max-margin-based parameter estimation method and apply it with the multinomial distribution. Extensive experimental study is conducted to systematically analyze the performance of the above three criteria and compare them with existing ones. As demonstrated, the best criterion within our framework achieves the overall best merging performance among the compared merging criteria developed in the literature. © 1979-2012 IEEE. Source

Discover hidden collaborations