Feng J.,National University of Singapore |
Ni B.,Illinois at Singapore Pte Ltd. |
Tian Q.,University of Texas at San Antonio |
Yan S.,National University of Singapore
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2011
Modern visual classification models generally include a feature pooling step, which aggregates local features over the region of interest into a statistic through a certain spatial pooling operation. Two commonly used operations are the average and max poolings. However, recent theoretical analysis has indicated that neither of these two pooling techniques may be qualified to be optimal. Besides, we further reveal in this work that more severe limitations of these two pooling methods are from the unrecoverable loss of the spatial information during the statistical summarization and the underlying over-simplified assumption about the feature distribution. We aim to address these inherent issues in this work and generalize previous pooling methods as follows. We define a weighted ℓp-norm spatial pooling function tailored for the class-specific feature spatial distribution. Moreover, a sensible prior for the feature spatial correlation is incorporated. Optimizing such pooling function towards optimal class separability yields a so-called geometric ℓp-norm pooling (GLP) method. The described GLP method is capable of preserving the class-specific spatial/geometric information in the pooled features and significantly boosts the discriminating capability of the resultant features for image classification. Comprehensive evaluations on several image benchmarks demonstrate that the proposed GLP method can boost the image classification performance with a single type of feature to outperform or be comparable with the state-of-the-arts. © 2011 IEEE.
Fu T.Z.J.,Illinois at Singapore Pte Ltd. |
Song Q.,Chinese University of Hong Kong |
Chiu D.M.,Chinese University of Hong Kong
Scientometrics | Year: 2014
By means of their academic publications, authors form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors select co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Academic Search and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to discover, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities—authors, publication venues, and institutions. We go beyond traditional metrics such as paper counts, citations, and h-index. Specifically, we define metrics such as influence, connections, and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his or her connections by co-authoring with other authors, and especially from other authors with high connections. An author receives exposure by publishing in selective venues where publications have received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and the similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors’ rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors. © 2014, Akadémiai Kiadó, Budapest, Hungary.
Xu J.,Northeastern University China |
Zhang Z.,Illinois at Singapore Pte Ltd. |
Tung A.K.H.,National University of Singapore |
Yu G.,Northeastern University China
VLDB Journal | Year: 2012
Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover's Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B + trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management. © 2011 Springer-Verlag.
Chen B.,Illinois at Singapore Pte Ltd. |
Zhou Z.,National University of Singapore |
Yu H.,National University of Singapore
Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM | Year: 2013
Counting the number of RFID tags, or RFID counting, is needed by a wide array of important wireless applications. Motivated by its paramount practical importance, researchers have developed an impressive arsenal of techniques to improve the performance of RFID counting (i.e., to reduce the time needed to do the counting). This paper aims to gain deeper and fundamental insights in this subject to facilitate future research on this topic. As our central thesis, we find out that the overlooked key design aspect for RFID counting protocols to achieve near-optimal performance is a conceptual separation of a protocol into two phases. The first phase uses small overhead to obtain a rough estimate, and the second phase uses the rough estimate to further achieve an accuracy target. Our thesis also indicates that other performanceenhancing techniques or ideas proposed in the literature are only of secondary importance. Guided by our central thesis, we manage to design near-optimal protocols that are more efficient than existing ones and simultaneously simpler than most of them. © 2013 by the Association for Computing Machinery, Inc.
Cai R.,Guangdong University of Technology |
Cai R.,Nanjing University |
Zhang Z.,Illinois at Singapore Pte Ltd. |
Hao Z.,Guangdong University of Technology
Neural Networks | Year: 2013
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data. © 2013 Elsevier Ltd.