Time filter

Source Type

Zhang T.,CAS Institute of Automation | Zhang T.,China Singapore Institute of Digital Media | Liu S.,CAS Institute of Automation | Xu C.,CAS Institute of Automation | Lu H.,CAS Institute of Automation
IEEE Transactions on Industrial Informatics | Year: 2013

Automated visual surveillance systems are attracting extensive interest due to public security. In this paper, we attempt to mine semantic context information including object-specific context information and scene-specific context information (learned from object-specific context information) to build an intelligent system with robust object detection, tracking, and classification and abnormal event detection. By means of object-specific context information, a cotrained classifier, which takes advantage of the multiview information of objects and reduces the number of labeling training samples, is learned to classify objects into pedestrians or vehicles with high object classification performance. For each kind of object, we learn its corresponding semantic scene-specific context information: motion pattern, width distribution, paths, and entry/exist points. Based on this information, it is efficient to improve object detection and tracking and abnormal event detection. Experimental results demonstrate the effectiveness of our semantic context features for multiple real-world traffic scenes. © 2005-2012 IEEE. Source

Bao B.-K.,CAS Institute of Automation | Bao B.-K.,China Singapore Institute of Digital Media | Zhu G.,University of California at Berkeley | Shen J.,Singapore Management University | Yan S.,National University of Singapore
IEEE Transactions on Image Processing | Year: 2013

Recent techniques based on sparse representation (SR) have demonstrated promising performance in high-level visual recognition, exemplified by the highly accurate face recognition under occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors, ambiguity associated with visual word assignment and misdetection of feature points, due to factors such as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based reconstruction objective with ℓbf0-norm regularization on the transfer terms to encourage sparsity and, hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex ℓ\bf0-norm optimization into a convex ℓ\bf1-norm optimization problem, and employ the accelerated proximal gradient method to optimize the convergence provable updating procedure. Extensive experiments on four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of the proposed framework. © 1992-2012 IEEE. Source

Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia | Year: 2012

Social media is becoming popular these days, where user necessarily interacts with each other to form social networks. Influence network, as one special case of social network, has been recognized as significantly impacting social activities and user decisions. We emphasize in this paper that the inter-user influence is essentially topic-sensitive, as for different tasks users tend to trust different influencers and be influenced most by them. While existing research focuses on global influence modeling and applies to text-based networks, this work investigates the problem of topic-sensitive influence modeling in the multimedia domain. We propose a multi-modal probabilistic model, considering both users' textual annotation and uploaded visual image. This model is capable of simultaneously extracting user topic distributions and topic-sensitive influence strengths. By identifying the topic-sensitive influencer, we are able to conduct applications like collective search and collaborative recommendation. A risk minimization-based general framework for personalized image search is further presented, where the image search task is transferred to measure the distance of image and personalized query language models. The framework considers the noisy tag issue and enables easy incorporation of social influence. We have conducted experiments on a large-scale Flickr dataset. Qualitative as well as quantitative evaluation results have validated the effectiveness of the topic-sensitive influencer mining model, and demonstrated the advantage of incorporating topic-sensitive influence in personalized image search and topic-based image recommendation. © 2012 ACM. Source

Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
ACM Transactions on Multimedia Computing, Communications and Applications | Year: 2011

The overwhelming amount of Web videos returned from search engines makes effective browsing and search a challenging task. Rather than conventional ranked list, it becomes necessary to organize the retrieved videos in alternative ways. In this article, we explore the issue of topic mining and organizing of the retrieved web videos in semantic clusters. We present a framework for clustering-based video retrieval and build a visualization user interface. A hierarchical topic structure is exploited to encode the characteristics of the retrieved video collection and a semi-supervised hierarchical topic model is proposed to guide the topic hierarchy discovery. Carefully designed experiments on web-scale video dataset collected from video sharing websites validate the proposed method and demonstrate that clustering-based video retrieval is practical to facilitate users for effective browsing. © 2011 ACM. Source

Zhang T.,CAS Institute of Automation | Zhang T.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
ACM Transactions on Multimedia Computing, Communications and Applications | Year: 2014

With the massive growth of events on the Internet, efficient organization and monitoring of events becomes a practical challenge. To deal with this problem, we propose a novel CO-PMHT (CO-Probabilistic Multi-Hypothesis Tracking) algorithm for crossdomain multi-event tracking to obtain their informative summary details and evolutionary trends over time. We collect a largescale dataset by searching keywords on two domains (Gooogle News and Flickr) and downloading both images and textual content for an event. Given the input data, our algorithm can track multiple events in the two domains collaboratively and boost the tracking performance. Specifically, the bridge between two domains is a semantic posterior probability, that avoids the domain gap. After tracking, we can visualize the whole evolutionary process of the event over time and mine the semantic topics of each event for deep understanding and event prediction. The extensive experimental evaluations on the collected dataset well demonstrate the effectiveness of the proposed algorithm for cross-domain multi-event tracking. Source

Discover hidden collaborations