Entity

Time filter

Source Type

Beijing, China

Wu S.,University of Science and Technology Beijing | Feng X.-D.,University of Science and Technology Beijing | Shan Z.-G.,State Information Center
Jisuanji Xuebao/Chinese Journal of Computers | Year: 2012

Missing data processing is an important problem of data pre-processing in data mining field. Traditional missing data filling methods are mostly based on some statistical hypothesis, such as probability distribution, which might not be the most applicable approaches for data mining of large data set. Inspired by ROUSTIDA, an incomplete data analysis approach not using probability statistical methods, MIBOI is proposed for missing data imputation based on incomplete data clustering. Constraint Tolerance Set Dissimilarity is defined for incomplete data set of categorical variables, so the general dissimilarity of all the incomplete data objects in a set can be directly computed, and the missing data is imputed according to the incomplete data clustering results. The empirical tests using UCI benchmark data sets show that MIBOI is effective and feasible. Source


Wang J.,Beihang University | Jiang Y.,Beihang University | Ouyang Y.,Beihang University | Li C.,Beihang University | And 2 more authors.
IEICE Electronics Express | Year: 2013

TCP is a low cost and easy-to-use transport layer pro-tocol widely used in datacenter based applications and web services. Many TCP congestion control algorithms have been proposed to im-prove the performance of TCP in datacenter networks. However, the emerging wireless technologies in datacenter networks create new prob-lems for TCP congestion control. On the one hand, TCP algorithms must suit the special application patterns (such as incast and Partition/ Aggregate) of the datacenters. On the other hand, TCP must perform well within the random packet losses caused by unreliable wireless links. Although there are many datacenter/wireless oriented TCP algorithms for either datacenter or wireless networks, designing a TCP algorithm that can meet both "datacenter" and "wireless" requirements of wire-less datacenters still remains a great challenge. In this paper, a novel congestion control algorithm is proposed to improve the performance of TCP over wireless datacenter environments. The proposed approach combines an ACK-based wireless network bandwidth estimate with an ECN-based datacenter TCP congestion control that can perform well in both the datacenter environments and wireless links. The simulated experiments validate the performance of our proposed approach. © IEICE 2013. Source


Wang J.,Beihang University | Li C.,Beihang University | Xiong Z.,Beihang University | Shan Z.,State Information Center
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | Year: 2014

Motivated by sustainable development requirements of global environment and modern cities, the concept of the Smart City has been introduced as a strategic device of future urbanization on a global scale. On the other hand, modern cities have built up developed information infrastructure and gathered massive city running data, and therefore are ready to face the coming of the Smart City concept, technologies and applications. An important peculiarity of Smart City is that the technology system is data-centric. The data science and technologies, such as big data, data vitalization, and data mining, play pivotal roles in Smart City related technologies. In this paper, we provide a comprehensive survey of the most recent research activities in data-centric Smart City. The survey is from an informatics perspective and all summarized Smart City works are based on data science and technologies. This paper first summarizes the variety and analyze the feature of urban data that are used in existing Smart City researches and applications. Then, the state-of-the-art progresses in the research of data-centric Smart City are surveyed from two aspects: research activities and research specialties. The research activities are introduced from system architectures, smart transportation, urban computing, and human mobility. The research specialties are introduced from core technologies and theory, interdisciplinary, the data-centric, and the regional feature. Finally, the paper raises some directions for future works. Source


Wen Z.,Beijing Normal University | Wen Z.,University of Pittsburgh | Wen Z.,State Information Center | Liang X.,University of Pittsburgh | And 2 more authors.
Water Resources Research | Year: 2012

A new multiscale routing framework is developed and coupled with the Hydrologically based Three-layer Variable Infiltration Capacity (VIC-3L) land surface model (LSM). This new routing framework has a characteristic of reducing impacts of different scales (both in space and time) on the routing results. The new routing framework has been applied to three different river basins with six different spatial resolutions and two different temporal resolutions. Their results have also been compared to the D8-based (eight direction based) routing scheme, whose flow network is generated from the widely used eight direction (D8) method, to evaluate the new framework's capability of reducing the impacts of spatial and temporal resolutions on the routing results. Results from the new routing framework show that they are significantly less affected by the spatial resolutions than those from the D8-based routing scheme. Comparing the results at the basins' outlets to those obtained from the instantaneous unit hydrograph (IUH) method which has, in principle, the least spatial resolution impacts on the routing results, the new routing framework provides results similar to those by the IUH method. However, the new routing framework has an advantage over the IUH method of providing routing information within the interior locations of a basin and along the river channels, while the IUH method cannot. The new routing framework also reduces impacts of different temporal resolutions on the routing results. The problem of spiky hydrographs caused by a typical routing method, due to the impacts of different temporal resolutions, can be significantly reduced. © 2012. American Geophysical Union. All Rights Reserved. Source


Wang G.,Guangxi University | Wang G.,Central South University | Shan Z.,State Information Center
Information Processing Letters | Year: 2012

Mesh networks have been applied to build large scale multicomputer systems and Network-on-Chips (NoCs). Mesh networks perform poorly in tolerating faults in the view of worst-case analysis, so it is practically important for multicomputer systems and NoCs manufactures to determine the lower bound for the mesh network connectivity probability when the node failure probability and the network size are given. In this paper, we study the topic based on k-submesh model under two fault models: Each node has uniform or nonuniform failure probability. We develop a novel technique to formally derive lower bound on the connectivity probability for mesh networks. Our study shows that mesh networks of practical size can tolerate a large number of faulty nodes and maintain higher connectivity probability, thus are reliable and trustworthy enough for multicomputer systems and NoCs. For example, suppose we are building a mesh network of 40 000 nodes (e.g., M200 ×200) and require a network connectivity probability 99%, we only need to bound the uniform node failure probability by 0.25%. On the other hand, for the same size network M200 ×200, the mesh network connectivity probability can maintain 95.88% even the network runs one million seconds uninterruptedly under exponential distribution node failure probability with failure rate 10 -9 level. © 2012 Elsevier B.V. © 2012 Elsevier B.V. All rights reserved. Source

Discover hidden collaborations