Time filter

Source Type

Lv Y.,Key Laboratory of Photoelectronic Imaging Technology and System | Lv Y.,Lotus Hill Research Institute | Yao B.,University of California at Los Angeles | Yao B.,Lotus Hill Research Institute | And 3 more authors.
Proceedings of IEEE Workshop on Applications of Computer Vision | Year: 2012

In this paper, we learn a reconfigurable template for detecting vehicles and classifying their types. We adopt a popular design for the part based model that has one coarse template covering entire object window and several small high-resolution templates representing parts. The reconfigurable template can learn part configurations that capture the spatial correlation of features for a deformable part based model. The features of templates are Histograms of Gradients (HoG). In order to better describe the actual dimensions and locations of "parts" (i.e. features with strong spatial correlations), we design a dictionary of rectangular primitives of various sizes, aspect-ratios and positions. A configuration is defined as a subset of non-overlapping primitives from this dictionary. To learn the optimal configuration using SVM amounts, we need to find the subset of parts that minimize the regularized hinge loss, which leads to a non-convex optimization problem. We solve this problem by replacing the hinge loss with a negative sigmoid loss that can be approximately decomposed into losses (or negative sigmoid scores) of individual parts. In the experiment, we compare our method empirically with group lasso and a state of the art method [7] and demonstrate that models learned with our method outperform others on two computer vision applications: vehicle localization and vehicle model recognition. © 2012 IEEE.

Zhu S.-C.,University of California at Los Angeles | Zhu S.-C.,Lotus Hill Research Institute | Shi K.,University of California at Los Angeles | Si Z.,University of California at Los Angeles
Pattern Recognition Letters | Year: 2010

Natural images have a vast amount of visual patterns distributed in a wide spectrum of subspaces of varying complexities and dimensions. Understanding the characteristics of these subspaces and their compositional structures is of fundamental importance for pattern modeling, learning and recognition. In this paper, we start with small image patches and define two types of atomic subspaces: explicit manifolds of low dimensions for structural primitives and implicit manifolds of high dimensions for stochastic textures. Then we present an information theoretical learning framework that derives common models for these manifolds through information projection, and study a manifold pursuit algorithm that clusters image patches into those atomic subspaces and ranks them according to their information gains. We further show how those atomic subspaces change over an image scaling process and how they are composed to form larger and more complex image patterns. Finally, we integrate the implicit and explicit manifolds to form a primal sketch model as a generic representation in early vision and to generate a hybrid image template representation for object category recognition in high level vision. The study of the mathematical structures in the image space sheds lights on some basic questions in human vision, such as atomic elements in visual perception, the perceptual metrics in various manifolds, and the perceptual transitions over image scales. This paper is based on the J.K. Aggarwal Prize lecture by the first author at the International Conference on Pattern Recognition, Tempa, FL. 2008. © 2009 Elsevier B.V. All rights reserved.

Zhao Y.,Beijing Jiaotong University | Zhao Y.,Lotus Hill Research Institute | Zhao Y.,University of California at Los Angeles | Zhu S.-C.,Lotus Hill Research Institute | And 2 more authors.
MM'10 - Proceedings of the ACM Multimedia 2010 International Conference | Year: 2010

This paper presents an interactive image segmentation framework which is ultra-fast and accurate. Our framework, termed "CO3", consists of three components: COupled representation, COnditional model and COnvex inference. (i) In representation, we pose the segmentation problem as partitioning an image domain into regions (foreground vs. background) or boundaries (on vs. off) which are dual but simultaneously compete with each other. Then, we formulate segmentation process as a combinatorial posterior ratio test in both the region and boundary partition space. (ii) In modeling, we use discriminative learning methods to train conditional models for both region and boundary based on interactive scribbles. We exploit rich image features at multi-scales, and simultaneously incorporate user's intention behind the interactive scribbles. (iii) In computing, we relax the energy function into an equivalent continuous form which is convex. Then, we adopt the Bregman iteration method to enforce the "coupling" of region and boundary terms with fast global convergence. In addition, a multigrid technique is further introduced, which is a coarse-to-fine mechanism and guarantees both feature discriminativeness and boundary preciseness by adjusting the size of image features gradually. The proposed interactive system is evaluated on three public datasets: Berkeley segmentation dataset, MSRC dataset and LHI dataset. Compared to five state-of-the-art approaches including Boycov et al., Bai et al., Grady, Unger et al. and Couprie et al., our system outperforms those established approaches in both accuracy and efficiency by a large margin and achieves state-of-the-art results. © 2010 ACM.

Wu Y.N.,University of California at Los Angeles | Si Z.,University of California at Los Angeles | Gong H.,University of California at Los Angeles | Gong H.,Lotus Hill Research Institute | And 2 more authors.
International Journal of Computer Vision | Year: 2010

This article proposes an active basis model, a shared sketch algorithm, and a computational architecture of sum-max maps for representing, learning, and recognizing deformable templates. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at selected locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate the observed image. The active basis model, in particular, the locations and the orientations of the basis elements, can be learned from training images by the shared sketch algorithm. The algorithm selects the elements of the active basis sequentially from a dictionary of Gabor wavelets. When an element is selected at each step, the element is shared by all the training images, and the element is perturbed to encode or sketch a nearby edge segment in each training image. The recognition of the deformable template from an image can be accomplished by a computational architecture that alternates the sum maps and the max maps. The computation of the max maps deforms the active basis to match the image data, and the computation of the sum maps scores the template matching by the log-likelihood of the deformed active basis. © 2009 The Author(s).

Zhao Y.,Beijing Institute of Technology | Zhao Y.,Lotus Hill Research Institute | Gong H.,Lotus Hill Research Institute | Gong H.,University of California at Los Angeles | And 4 more authors.
Pattern Recognition Letters | Year: 2012

This paper presents a novel background model for video surveillance - Spatio-Temporal Patch based Background Modeling (STPBM). We use spatio-temporal patches, called bricks, to characterize both the appearance and motion information. Our method is based on the observation that all the background bricks at a given location under all possible lighting conditions lie in a low dimensional background subspace, while bricks with moving foreground are widely distributed outside. An efficient online subspace learning method is presented to capture the subspace, which is able to model the illumination changes more robustly than traditional pixel-wise or block-wise methods. Experimental results demonstrate that the proposed method is insensitive to drastic illumination changes yet capable of detecting dim foreground objects under low contrast. Moreover, it outperforms the state-of-the-art in various challenging scenes with illumination changes. © 2012 Elsevier B.V. All rights reserved.

Lin L.,Sun Yat Sen University | Lin L.,Huazhong University of Science and Technology | Liu X.,Huazhong University of Science and Technology | Liu X.,Lotus Hill Research Institute | And 4 more authors.
Pattern Recognition | Year: 2012

In this paper, we present a framework for object categorization via sketch graphs that incorporate shape and structure information. In this framework, we integrate the learnable And-Or graph model, a hierarchical structure that combines the reconfigurability of a stochastic context free grammar (SCFG) with the constraints of a Markov random field (MRF). Considering the computation efficiency, we generalize instances from the And-Or graph models and perform a set of sequential tests for cascaded object categorization, rather than directly inferring with the And-Or graph models. We study 33 categories, each consisting of a small data set of 30 instances, and 30 additional templates with varied appearance are generalized from the learned And-Or graph model. These samples better span the appearance space and form an augmented training set ΩT of 1980 (60×33) training templates. To perform recognition on a testing image, we use a set of sequential tests to project ΩT into different representation spaces to narrow the number of candidate matches in ΩT. We use graphlets (structural elements), as our local features and model ΩT at each stage using histograms of graphlets over categories, histograms of graphlets over object instances, histograms of pairs of graphlets over objects, and shape context. Each test is increasingly computationally expensive, and by the end of the cascade we have a small candidate set remaining to use with our most powerful test, a top-down graph matching algorithm. We apply the proposed approach on the challenging public dataset including 33 object categories, and achieve state-of-the-art performance. © 2012 Elsevier Ltd.

Lin L.,Sun Yat Sen University | Lin L.,Lotus Hill Research Institute | Wang X.,Sun Yat Sen University | Yang W.,Sun Yat Sen University | Lai J.,Sun Yat Sen University
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2012

This paper proposes a simple yet effective method to learn the hierarchical object shape model consisting of local contour fragments, which represents a category of shapes in the form of an And-Or tree. This model extends the traditional hierarchical tree structures by introducing the switch variables (i.e. the or-nodes) that explicitly specify production rules to capture shape variations. We thus define the model with three layers: the leaf-nodes for detecting local contour fragments, the or-nodes specifying selection of leaf-nodes, and the root-node encoding the holistic distortion. In the training stage, for optimization of the And-Or tree learning, we extend the concave-convex procedure (CCCP) by embedding the structural clustering during the iterative learning steps. The inference of shape detection is consistent with the model optimization, which integrates the local testings via the leaf-nodes and or-nodes with the global verification via the root-node. The advantages of our approach are validated on the challenging shape databases (i.e., ETHZ and INRIA Horse) and summarized as follows. (1) The proposed method is able to accurately localize shape contours against unreliable edge detection and edge tracing. (2) The And-Or tree model enables us to well capture the intraclass variance. © 2012 IEEE.

Zhang J.,Beijing Institute of Technology | Zhang J.,Lotus Hill Research Institute | Hu W.,University of California at Los Angeles | Yao B.,University of California at Los Angeles | And 3 more authors.
Proceedings of the IEEE International Conference on Computer Vision | Year: 2011

In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision. © 2011 IEEE.

Xie Y.,Beijing Institute of Technology | Xie Y.,Lotus Hill Research Institute | Lin L.,Lotus Hill Research Institute | Lin L.,Sun Yat Sen University | Jia Y.,Beijing Institute of Technology
Proceedings - International Conference on Pattern Recognition | Year: 2010

Compared to the traditional tracking with fixed cameras, the PTZ-camera-based tracking is more challenging due to (i) lacking of reliable background modeling and subtraction; (ii) the appearance and scale of target changing suddenly and drastically. Tackling these problems, this paper proposes a novel tracking algorithm using patch-based object models and demonstrates its advantages with the PTZ-camera in the application of visual surveillance. In our method, the target model is learned and represented by a set of feature patches whose discriminative power is higher than others. The target model is matched and evaluated by both appearance and motion consistency measurements. The homography between frames is also calculated for scale adaptation. The experiment on several surveillance videos shows that our method outperforms the state-of-arts approaches. © 2010 IEEE.

Lin L.,Lotus Hill Research Institute | Lin L.,Sun Yat Sen University | Zeng K.,Lotus Hill Research Institute | Lv H.,Lotus Hill Research Institute | And 4 more authors.
NPAR Symposium on Non-Photorealistic Animation and Rendering | Year: 2010

We present an interactive system that stylizes an input video into a painterly animation. The system consists of two phases. The first is an Video Parsing phase that extracts and labels semantic objects with different material properties (skin, hair, cloth, and so on) in the video, and then establishes robust correspondence between frames for discriminative image features inside each object. The second Painterly Rendering phase performs the stylization based on the video semantics and feature correspondence. Compared to the previous work, the proposed method advances painterly animation in three aspects: Firstly, we render artistic painterly styles using a rich set of example-based brush strokes. These strokes, placed in multiple layers and passes, are automatically selected according to the video semantics. Secondly, we warp brush strokes according to global object deformations, so that the strokes appear to be tightly attached to the object surfaces. Thirdly, we propose a series of novel teniques to reduce the scintillation effects. Results applying our system to several video clips show that it produces expressive oil painting animations. © 2010 ACM.

Loading Lotus Hill Research Institute collaborators
Loading Lotus Hill Research Institute collaborators