Zhou J.,PLA University of Science and Technology |
Diao X.,Institute of Electronic System Engineering of China |
Cao J.,Nanjing Institute of Technology |
Zhou X.,PLA University of Science and Technology
2016 IEEE International Conference on Knowledge Engineering and Applications, ICKEA 2016 | Year: 2016
A variety of data dependencies have been proposed for data cleaning, including conditional functional dependencies, editing rules and so on. Fixing rule is a newly proposed class of data dependencies for data repairing with high accuracy. However, to our knowledge, algorithms for automatically designing fixing rules have not been developed. In this paper, a workflow of generating fixing rules has been proposed and an efficient algorithm for automatically producing fixing rules from constant CFDs has been developed. The algorithm has been proved to have the ability to avoid generating conflict FRs, and experiments show that the algorithm scales well with the size of input and can output a consistent set of FRs with fairly good usability. © 2016 IEEE.
Ou W.-J.,Wuhan University |
Zeng C.,Wuhan University |
Zeng C.,Tsinghua University |
Xiang X.-M.,Wuhan University |
And 3 more authors.
Jisuanji Xuebao/Chinese Journal of Computers | Year: 2011
Cloud computing and service-oriented applications on the Internet are growing rapidly. On the other hand, open platform speed up the emergence of composited services, which is based on Cloud service. How to discover the desired services for users efficiently has become a significant challenge. Although traditional approaches have made progress in recall rate and precision. But they are not suitable for large-scale service discovery under dynamic environment. In this article, a novel approach for service query based on concept relaxation is proposed, which employs the semantic relation of concepts from hierarchical ontology. The unrelated concepts and services are figured out to improve the efficiency of algorithm. The method has been implemented in a prototype of on-demand service platform. The results of experiments illustrate that the proposed approach not only outperform the traditional ones, but also satisfy the scalability of service discovery.
Gao B.,Harbin Institute of Technology |
Gao B.,Shenzhen Institute of Information Technology |
Zhang Q.-Y.,Harbin Institute of Technology |
Liang Y.-S.,Shenzhen Institute of Information Technology |
And 3 more authors.
Tongxin Xuebao/Journal on Communications | Year: 2011
A novel method based on empirical mode decomposition (EMD) and ARMA was proposed to model and forecast self-similar networking traffic. The results demonstrate that EMD had the function of getting rid of the long range dependence(LRD) in traffic data. Therefore, the self-similar traffic processed by EMD could be modeled and predicted well by using ARMA which was a short range dependent(SRD) model. Moreover, the complexity of the proposed method was reduced sharply and the prediction precision was higher than radial basis function neural network.
Li G.,PLA University of Science and Technology |
Li G.,Xi'an Jiaotong University |
Pan Z.,PLA University of Science and Technology |
Xiao B.,Institute of Electronic System Engineering of China |
Huang L.,PLA University of Science and Technology
Intelligent Data Analysis | Year: 2014
Social networks are widespread and important in our daily life. Finding communities and reveal node characteristics in community are crucial to understand the network structure and function. Many methods based on Nonnegative matrix factorization (NMF) are proposed to find communities, while these results appear uncertain with the initial condition especially in weighted directed network. In this paper, firstly we improve the nonnegative matrix factorization (NMF) method with modeling network as the weighted directed graph and using diagonally dominant matrix as constraint condition to obtain the community membership of each node as well as the interaction among communities. Furthermore, we raise methods to evaluate nodes importance and to discuss node characteristics in community to analyze the network structure. Some experiments on the Zachary club datasets and other real-world datasets have been did to demonstrate the superiority of our methods for community discovery over other related matrix factorization methods. The results demonstrate that our methods are useful and applicable both in weighted directed model and undirected model for community discovery, and the results are more reliable. Experiments also illustrate the meaningful results by discussing the node characteristics in community. All those provide a useful way for analyzing social network. © 2014 - IOS Press and the authors. All rights reserved.
Li W.,Wuhan University |
Wang L.,Wuhan University |
Peng Z.,Wuhan University |
Li D.,Institute of Electronic System Engineering of China
Proceedings - 11th Web Information System and Application Conference, WISA 2014 | Year: 2014
Data streams with high volume and complicated items become more and more common, and typical algorithms of finding top-k frequent items on streams, such as counter-based algorithms and sketch algorithms, are gradually not keeping up with efficiency requirements. Our paper focuses on finding top-k frequent items on timestamp-based complicated streams, and proposes an approximate solution by sampling. Specifically, we design a multi-treap parallel priority algorithm to maintain uniform sample on timestamp-based sliding windows. The top-k answers are approximated through processing on samples. We also theoretically analyze the relationship between item accuracy and sample size. Through experimental analysis on real data, our method provides flexible sample size to satisfy different accuracy requirements and ensure a good running efficiency. © 2014 IEEE.
Li W.-F.,Wuhan University |
Peng Z.-Y.,Wuhan University |
Li D.-Y.,Institute of Electronic System Engineering of China
Ruan Jian Xue Bao/Journal of Software | Year: 2012
Efficient processing of Top-K queries has always been a significant technique in the interactive environment involving massive amounts of data. With the emerging of imprecise data, the management of them has gradually raised people's attention. In contrast with traditional Top-K query, Top-K query on uncertain data presents different features both in semantics and computation. On the basis of prevailing uncertain data model and possible world semantic model, researchers have already studied multiple sound semantics and efficient approaches. This survey describes and classifies Top-K processing techniques on uncertain data including semantics, rank criteria, algorithms and implementation levels, and so on. Finally, the challenges and future research trends in processing of Top-k queries on uncertain data are predicated. © 2012 ISCAS.
Yuan L.,National University of Defense Technology |
Yuan L.,Henan University of Technology |
Wang H.,National University of Defense Technology |
Yin G.,National University of Defense Technology |
And 3 more authors.
Proceedings - Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing in Conjunction with the UIC 2010 and ATC 2010 Conferences, UIC-ATC 2010 | Year: 2010
Participants of a software project have a significant impact on whether the project could achieve success, and the relevant information can reflect some trustworthy properties of software. By studying a large number of OSS projects in SourceForge, the role configuration of these projects is analyzed, and some latent frequent patterns are discovered in this paper. It prepares the ground for quantification and utilization of the software trustworthiness evidence from the roles information. © 2010 IEEE.