China Singapore Institute of Digital Media

Singapore, Singapore

China Singapore Institute of Digital Media

Singapore, Singapore
SEARCH FILTERS
Time filter
Source Type

Bao B.-K.,CAS Institute of Automation | Bao B.-K.,China Singapore Institute of Digital Media | Zhu G.,University of California at Berkeley | Shen J.,Singapore Management University | Yan S.,National University of Singapore
IEEE Transactions on Image Processing | Year: 2013

Recent techniques based on sparse representation (SR) have demonstrated promising performance in high-level visual recognition, exemplified by the highly accurate face recognition under occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors, ambiguity associated with visual word assignment and misdetection of feature points, due to factors such as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based reconstruction objective with ℓbf0-norm regularization on the transfer terms to encourage sparsity and, hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex ℓ\bf0-norm optimization into a convex ℓ\bf1-norm optimization problem, and employ the accelerated proximal gradient method to optimize the convergence provable updating procedure. Extensive experiments on four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of the proposed framework. © 1992-2012 IEEE.


Bao B.-K.,CAS Institute of Automation | Bao B.-K.,China Singapore Institute of Digital Media | Liu G.,University of Illinois at Urbana - Champaign | Xu C.,CAS Institute of Automation | And 2 more authors.
IEEE Transactions on Image Processing | Year: 2012

In this paper, we address the error correction problem, that is, to uncover the low-dimensional subspace structure from high-dimensional observations, which are possibly corrupted by errors. When the errors are of Gaussian distribution, principal component analysis (PCA) can find the optimal (in terms of least-square error) low-rank approximation to high-dimensional data. However, the canonical PCA method is known to be extremely fragile to the presence of gross corruptions. Recently, Wright established a so-called robust principal component analysis (RPCA) method, which can well handle the grossly corrupted data. However, RPCA is a transductive method and does not handle well the new samples, which are not involved in the training procedure. Given a new datum, RPCA essentially needs to recalculate over all the data, resulting in high computational cost. So, RPCA is inappropriate for the applications that require fast online computation. To overcome this limitation, in this paper, we propose an inductive robust principal component analysis (IRPCA) method. Given a set of training data, unlike RPCA that targets on recovering the original data matrix, IRPCA aims at learning the underlying projection matrix, which can be used to efficiently remove the possible corruptions in any datum. The learning is done by solving a nuclear-norm regularized minimization problem, which is convex and can be solved in polynomial time. Extensive experiments on a benchmark human face dataset and two video surveillance datasets show that IRPCA cannot only be robust to gross corruptions, but also handle the new data well and in an efficient way. © 1992-2012 IEEE.


Min W.,CAS Institute of Automation | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media | Xu M.,University of Technology, Sydney | And 3 more authors.
IEEE Transactions on Multimedia | Year: 2014

Landmark search is crucial to improve the quality of travel experience. Smart phones make it possible to search landmarks anytime and anywhere. Most of the existing work computes image features on smart phones locally after taking a landmark image. Compared with sending original image to the remote server, sending computed features saves network bandwidth and consequently makes sending process fast. However, this scheme would be restricted by the limitations of phone battery power and computational ability. In this paper, we propose to send compressed (low resolution) images to remote server instead of computing image features locally for landmark recognition and search. To this end, a robust 3D model based method is proposed to recognize query images with corresponding landmarks. Using the proposed method, images with low resolution can be recognized accurately, even though images only contain a small part of the landmark or are taken under various conditions of lighting, zoom, occlusions and different viewpoints. In order to provide an attractive landmark search result, a 3D texture model is generated to respond to a landmark query. The proposed search approach, which opens up a new direction, starts from a 2D compressed image query input and ends with a 3D model search result. © 2014 IEEE.


Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
ACM Transactions on Multimedia Computing, Communications and Applications | Year: 2011

The overwhelming amount of Web videos returned from search engines makes effective browsing and search a challenging task. Rather than conventional ranked list, it becomes necessary to organize the retrieved videos in alternative ways. In this article, we explore the issue of topic mining and organizing of the retrieved web videos in semantic clusters. We present a framework for clustering-based video retrieval and build a visualization user interface. A hierarchical topic structure is exploited to encode the characteristics of the retrieved video collection and a semi-supervised hierarchical topic model is proposed to guide the topic hierarchy discovery. Carefully designed experiments on web-scale video dataset collected from video sharing websites validate the proposed method and demonstrate that clustering-based video retrieval is practical to facilitate users for effective browsing. © 2011 ACM.


Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia | Year: 2012

Social media is becoming popular these days, where user necessarily interacts with each other to form social networks. Influence network, as one special case of social network, has been recognized as significantly impacting social activities and user decisions. We emphasize in this paper that the inter-user influence is essentially topic-sensitive, as for different tasks users tend to trust different influencers and be influenced most by them. While existing research focuses on global influence modeling and applies to text-based networks, this work investigates the problem of topic-sensitive influence modeling in the multimedia domain. We propose a multi-modal probabilistic model, considering both users' textual annotation and uploaded visual image. This model is capable of simultaneously extracting user topic distributions and topic-sensitive influence strengths. By identifying the topic-sensitive influencer, we are able to conduct applications like collective search and collaborative recommendation. A risk minimization-based general framework for personalized image search is further presented, where the image search task is transferred to measure the distance of image and personalized query language models. The framework considers the noisy tag issue and enables easy incorporation of social influence. We have conducted experiments on a large-scale Flickr dataset. Qualitative as well as quantitative evaluation results have validated the effectiveness of the topic-sensitive influencer mining model, and demonstrated the advantage of incorporating topic-sensitive influence in personalized image search and topic-based image recommendation. © 2012 ACM.


Zhang T.,CAS Institute of Automation | Zhang T.,China Singapore Institute of Digital Media | Liu S.,CAS Institute of Automation | Xu C.,CAS Institute of Automation | Lu H.,CAS Institute of Automation
IEEE Transactions on Industrial Informatics | Year: 2013

Automated visual surveillance systems are attracting extensive interest due to public security. In this paper, we attempt to mine semantic context information including object-specific context information and scene-specific context information (learned from object-specific context information) to build an intelligent system with robust object detection, tracking, and classification and abnormal event detection. By means of object-specific context information, a cotrained classifier, which takes advantage of the multiview information of objects and reduces the number of labeling training samples, is learned to classify objects into pedestrians or vehicles with high object classification performance. For each kind of object, we learn its corresponding semantic scene-specific context information: motion pattern, width distribution, paths, and entry/exist points. Based on this information, it is efficient to improve object detection and tracking and abnormal event detection. Experimental results demonstrate the effectiveness of our semantic context features for multiple real-world traffic scenes. © 2005-2012 IEEE.


Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
MM'10 - Proceedings of the ACM Multimedia 2010 International Conference | Year: 2010

A decent movie summary is helpful for movie producer to promote the movie as well as audience to capture the theme of the movie before watching the whole movie. Most exiting automatic movie summarization approaches heavily rely on video content only, which may not deliver ideal result due to the semantic gap between computer calculated low-level features and human used high-level understanding. In this paper, we incorporate script into movie analysis and propose a novel character-based movie summarization approach, which is validated by modern film theory that what actually catches audiences' attention is the character. We first segment scenes in the movie by analysis and alignment of script and movie. Then we conduct substory discovery and content attention analysis based on the scent analysis and character interaction features. Given obtained movie structure and content attention value, we calculate movie attraction scores at both shot and scene levels and adopt this as criterion to generate movie summary. The promising experimental results demonstrate that character analysis is effective for movie summarization and movie content understanding. © 2010 ACM.


Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media | Lu D.,CAS Institute of Automation
IEEE Transactions on Multimedia | Year: 2012

Increasingly developed social sharing websites like Flickr and Youtube allow users to create, share, annotate, and comment medias. The large-scale user-generated metadata not only facilitate users in sharing and organizing multimedia content, but provide useful information to improve media retrieval and management. Personalized search serves as one of such examples where the web search experience is improved by generating the returned list according to the modified user search intents. In this paper, we exploit the social annotations and propose a novel framework simultaneously considering the user and query relevance to learn to personalized image search. The basic premise is to embed the user preference and query-related search intent into user-specific topic spaces. Since the users' original annotation is too sparse for topic modeling, we need to enrich users' annotation pool before user-specific topic spaces construction. The proposed framework contains two components: 1) a ranking-based multicorrelation tensor factorization model is proposed to perform annotation prediction, which is considered as users' potential annotations for the images; 2) we introduce user-specific topic modeling to map the query relevance and user preference into the same user-specific topic space. For performance evaluation, two resources involved with users' social activities are employed. Experiments on a large-scale Flickr dataset demonstrate the effectiveness of the proposed method. © 2012 IEEE.


Sang J.,CAS Institute of Automation | Sang J.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
Journal of Multimedia | Year: 2012

The overwhelming amount of web videos posted on the social media websites make effective browsing and search a challenging task. The user-provided metadata, has been proved useful in large-scale video organization and retrieval. Search result clustering, which utilizes the associated metadata to cluster the returned results into semantic groups according to its involved subtopics, has shown its advantages. Most of the existing works on search result clustering are devoted to solving the ambiguous problem resulted from general queries. In this paper, we propose the problem of faceted subtopic retrieval, which focus on more complex queries concerning political and social events or issues. Hierarchical topic model (hLDA) is adapted to exploit the intrinsic topic hierarchy inside the retrieved collections. Furthermore, this paper offers a new perspective of multi-modal video analysis by exploring the pairwise visual cues deriving from duplicate detection for constrained topic modeling. We modify the standard hierarchical topic model by integrating: 1) query related Supervision knowledge (ShLDA) and 2) duplicate Relation constraints (RShLDA). Carefully designed experiments on web-scale video dataset validate the proposed method. © 2012 Academy Publisher.


Zhang T.,CAS Institute of Automation | Zhang T.,China Singapore Institute of Digital Media | Xu C.,CAS Institute of Automation | Xu C.,China Singapore Institute of Digital Media
ACM Transactions on Multimedia Computing, Communications and Applications | Year: 2014

With the massive growth of events on the Internet, efficient organization and monitoring of events becomes a practical challenge. To deal with this problem, we propose a novel CO-PMHT (CO-Probabilistic Multi-Hypothesis Tracking) algorithm for crossdomain multi-event tracking to obtain their informative summary details and evolutionary trends over time. We collect a largescale dataset by searching keywords on two domains (Gooogle News and Flickr) and downloading both images and textual content for an event. Given the input data, our algorithm can track multiple events in the two domains collaboratively and boost the tracking performance. Specifically, the bridge between two domains is a semantic posterior probability, that avoids the domain gap. After tracking, we can visualize the whole evolutionary process of the event over time and mine the semantic topics of each event for deep understanding and event prediction. The extensive experimental evaluations on the collected dataset well demonstrate the effectiveness of the proposed algorithm for cross-domain multi-event tracking.

Loading China Singapore Institute of Digital Media collaborators
Loading China Singapore Institute of Digital Media collaborators