Provident Technology Pte. Ltd.

Singapore, Singapore

Provident Technology Pte. Ltd.

Singapore, Singapore
SEARCH FILTERS
Time filter
Source Type

Hu Q.,Zhejiang University | Chiew K.,Provident Technology Pte. Ltd. | Huang H.,National University of Singapore | He Q.,Zhejiang University
SIAM International Conference on Data Mining 2014, SDM 2014 | Year: 2014

Data sets collected from crowdsourcing platforms are well known for their cheap costs. But cheap costs may lead to low quality, i.e., labels may be incorrect or missing. Most of the existing work focuses on modeling the labeling errors of crowd workers, but missing labels can also cause problems when modeling the data. In this paper, we present an algorithm to predict the missing labels of crowd workers, in which we adopt thoughts from semi-supervised learning and utilize the particular consistency between crowd workers. We also define the consistency between workers by crowd labels and develop an algorithm to learn them from the data automatically. Experiments on both benchmark and real data show that our algorithm outperforms traditional semisupervised learning algorithms in predicting missing labels, and the recovered crowd labels are capable of predicting the ground truth and reflecting real properties of crowd workers. Copyright © SIAM.


Hu Q.,Zhejiang University | He Q.,Zhejiang University | Huang H.,National University of Singapore | Chiew K.,Provident Technology Pte. Ltd. | Liu Z.,Zhejiang University
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

Crowdsourcing services have been proven efficient in collecting large amount of labeled data for supervised learning, but low cost of crowd workers leads to unreliable labels. Various methods have been proposed to infer the ground truth or learn from crowd data directly though, there is no guarantee that these methods work well for highly biased or noisy crowd labels. Motivated by this limitation of crowd data, we propose to improve the performance of crowdsourcing learning tasks with some additional expert labels by treating each labeler as a personal classifier and combining all labelers' opinions from a model combination perspective. Experiments show that our method can significantly improve the learning quality as compared with those methods solely using crowd labels. © 2014 Springer International Publishing.


Liu Z.,Zhejiang University | Huang H.,National University of Singapore | He Q.,Zhejiang University | Chiew K.,Provident Technology Pte. Ltd. | Ma L.,Zhejiang University
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

Rare category detection (RCD) aims at finding out at least one data example of each rare category in an unlabeled data set with the help of a labeling oracle to prove the existence of such a rare category. Various approaches have been proposed for RCD with quadratic or even cubic time complexity. In this paper, by using histogram density estimation and wavelet analysis, we propose FRED algorithm and its prior-free version iFRED algorithm for RCD, both of which achieve linear time complexity w.r.t. either the data set size N or the data dimension d. Theoretical analysis guarantees its effectiveness, and comprehensive experiments on both synthetic and real data sets verify the effectiveness and efficiency of our algorithms. © 2014 Springer International Publishing.


Huang H.,Zhejiang University | Huang H.,National University of Singapore | Gao Y.,Zhejiang University | Chiew K.,Provident Technology Pte. Ltd. | And 2 more authors.
Expert Systems with Applications | Year: 2014

Poly-relational networks such as social networks are prevalent in the real world. The existing research on poly-relational networks focuses on community detection, aiming to find a global partition of nodes across relations. However, in some real cases, users may be not interested in such a global partition. For example, commercial analysts often care more about the top-k core members in business competitions, and relations among them that are more important to their competitions. Motivated by this, in this paper, we investigate an unsupervised analysis of the top-k core members in a poly-relational network and identify two complementary tasks, namely (1) detection of the top-k core members that are most tightly connected by relevant relations, and (2) identification of the relevant relations via analysis on the importance of each relation to the formation of the top-k core members. Towards this, we propose an optimization framework to jointly deal with the two tasks by maximizing the connectivity between the candidates of the top-k core members across all relations with a synchronously updated weight for each relation. The effectiveness of our framework is verified both theoretically and experimentally. © 2014 Elsevier Ltd. All rights reserved.


Huang H.,Zhejiang University | Huang H.,National University of Singapore | Chiew K.,Provident Technology Pte. Ltd. | Gao Y.,Zhejiang University | And 2 more authors.
Expert Systems with Applications | Year: 2014

Rare category discovery aims at identifying unlabeled data examples of rare categories in a given data set. The existing approaches to rare category discovery often need a certain number of labeled data examples as the training set, which are usually difficult and expensive to acquire in practice. To save the cost however, if these methods only use a small training set, their accuracy may not be satisfactory for real applications. In this paper, for the first time, we propose the concept of rare category exploration, aiming to discover all data examples of a rare category from a seed (which is a labeled data example of this rare category) instead of from a training set. To this end, we present an approach known as the FRANK algorithm which transforms rare category exploration to local community detection from a seed in a kNN (k-nearest neighbors) graph with an automatically selected k value. Extensive experimental results on real data sets verify the effectiveness and efficiency of our FRANK algorithm. © 2014 Elsevier Ltd. All rights reserved.


Huang H.,National University of Singapore | Gao Y.,Zhejiang University | Gao Y.,Provident Technology Pte. Ltd. | Chiew K.,Hong Kong University of Science and Technology | And 2 more authors.
Proceedings - International Conference on Data Engineering | Year: 2014

Mining arbitrary shaped clusters in large data sets is an open challenge in data mining. Various approaches to this problem have been proposed with high time complexity. To save computational cost, some algorithms try to shrink a data set size to a smaller amount of representative data examples. However, their user-defined shrinking ratios may significantly affect the clustering performance. In this paper, we present CLASP an effective and efficient algorithm for mining arbitrary shaped clusters. It automatically shrinks the size of a data set while effectively preserving the shape information of clusters in the data set with representative data examples. Then, it adjusts the positions of these representative data examples to enhance their intrinsic relationship and make the cluster structures more clear and distinct for clustering. Finally, it performs agglomerative clustering to identify the cluster structures with the help of a mutual k-nearest neighbors-based similarity metric called Pk. Extensive experiments on both synthetic and real data sets are conducted, and the results verify the effectiveness and efficiency of our approach. © 2014 IEEE.


Ma L.,Zhejiang University | Huang H.,National University of Singapore | He Q.,Zhejiang University | Chiew K.,Provident Technology Pte. Ltd. | Liu Z.,Zhejiang University
Journal of Intelligent Information Systems | Year: 2014

Local community detection aims at finding a community structure starting from a seed which is a given vertex in a network without global information, such as online social networks that are too large and dynamic to ever be known fully. Nonetheless, the existing approaches to local community detection are usually sensitive to seeds, i.e., some seeds may lead to missing of some true communities. In this paper, we present a seed-insensitive method called GMAC and its variation iGMAC for local community detection. They estimate the similarity among vertices by investigating vertices' neighborhoods, and reveal a local community by maximizing its internal similarity and minimizing its external similarity simultaneously. Extensive experimental results on both synthetic and real-world data sets verify the effectiveness of our algorithms. © 2014 Springer Science+Business Media New York.


Liu Z.,Zhejiang University | Chiew K.,Provident Technology Pte. Ltd. | He Q.,Zhejiang University | Huang H.,Zhejiang University | And 2 more authors.
Expert Systems with Applications | Year: 2014

Identifying statistically significant anomalies in an unlabeled data set is of key importance in many applications such as financial security and remote sensing. Rare category detection (RCD) helps address this issue by passing candidate data examples to a labeling oracle (e.g.; a human expert) for labeling. A challenging task in RCD is to discover all categories without any prior information about the given data set. A few approaches have been proposed to address this issue, which are on quadratic or cubic time complexities w.r.t. the data set size N and require considerable labeling queries involving time-consuming and expensive labeling effort of a human expert. In this paper, aiming at solutions with lower time complexity and less labeling queries, we propose two prior-free (i.e.; without any prior information about a given data set) RCD algorithms, namely (1) iFRED which achieves linear time complexity w.r.t. N, and (2) vFRED which substantially reduces the number of labeling queries. This is done by tabulating each dimension of the data set into bins, followed by zooming out to shrink each bin down to a position and conducting wavelet analysis on the data density function to fast locate the position (i.e.; a bin) of a rare category, and zooming in the located bin to select candidate data examples for labeling. Theoretical analysis guarantees the effectiveness of our algorithms, and comprehensive experiments on both synthetic and real data sets further verify the effectiveness and efficiency. © 2014 Elsevier Ltd. All rights reserved.


Chiew K.,Provident Technology Pte. Ltd | Li Y.,Singapore Management University | Xu C.,Zhejiang University
Studies in Computational Intelligence | Year: 2014

Many reader/tag authentication protocols are proposed to effectively authenticate tags and readers. However, we demonstrate with YA-TRAP as an example how false authentications that a legitimate tag could be wrongly rejected by a reader may arise from these protocols when they are applied to C1G2 (class 1 generation 2) passive RFID tags. In this chapter, we identify a protocol pattern of which the implementation on C1G2 passive tags leads to false authentications, and further identify three types of the existing protocols that can bring with false authentications due to containing this pattern.Moreover, we give a necessary and sufficient condition for false authentications prevention, and propose a naive semaphore-based solution which revises the pattern by adding semaphore operations so as to avoid false authentications. Our experiments demonstrate the arising of false authentications and verify the effectiveness of our solution. © Springer International Publishing Switzerland 2014.

Loading Provident Technology Pte. Ltd. collaborators
Loading Provident Technology Pte. Ltd. collaborators