Entity

Time filter

Source Type


Du J.,State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System | Du J.,State Key Laboratory of High Performance Computing | Ao F.,State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System | Sui S.,State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System | Wang H.,State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System
Proceedings - 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2015 | Year: 2015

Recently, the real-time synthetic aperture radar (SAR) imaging technique is a hotspot of research in the field of remote sensing and military applications. As the SAR imaging algorithm is associated with high data and computation intensive, it is suitable for using hybrid storage systems, e.g. A cluster, for the performance acceleration. To design a SAR algorithm with high performance, we need consider a prerequisite to maximize the parallelizability of the algorithm due to multi-level parallelization features of the cluster platform. Focusing on the large-scale data, we explore concurrency characteristics of the SAR imaging algorithm on a hybrid storage system, and propose some parallel optimization techniques to accelerate the SAR imaging algorithm. According to the study, we implement a parallel SAR imaging algorithm and evaluate its performance. Experiment results show that the optimized SAR imaging program has high-speed network utilization, and can realize obvious improvement on the performance. © 2015 IEEE. Source


Shen L.,State Key Laboratory of High Performance Computing | Shen L.,National University of Defense Technology | Xu F.,State Key Laboratory of High Performance Computing | Xu F.,National University of Defense Technology | And 2 more authors.
Journal of Computer Science and Technology | Year: 2016

Thread level speculation provides not only a simple parallel programming model, but also an effective mechanism for thread-level parallelism exploitation. The performance of software speculative parallel models is limited by high global overheads caused by different types of loops. These loops usually have different characteristics of dependencies and different requirements of optimization strategies. In this paper, we propose three comprehensive optimization techniques to reduce different factors of global overheads, aiming at requirements from different types of loops. Inter-thread fetching can reduce the high mis-speculation rate of the loops with frequent dependencies and out-of-order committing can reduce the control overhead of the loops with infrequent dependencies, while enhanced dynamic task granularity resizing can reduce the control overhead and optimize the global overhead of the loops with changing characteristics of dependencies. All these three optimization techniques have been implemented in HEUSPEC, a software TLS system. Experimental results indicate that they can satisfy the demands from different groups of benchmarks. The combination of these techniques can improve the performance of all benchmarks and reach a higher average speedup. © 2016, Springer Science+Business Media New York. Source


Mao H.,National University of Defense Technology | Mao H.,State Key Laboratory of High Performance Computing | Xiao N.,National University of Defense Technology | Xiao N.,State Key Laboratory of High Performance Computing | And 2 more authors.
Communications in Computer and Information Science | Year: 2012

For the convenience, the cloud storage service has been used in daily life very commonly. However, the service sometimes suffers from the availability problems. They may not be able to be accessed timely for reasons (e.g. service down, network connection banned). It reduces the availability of the cloud services. To overcome the problems and challenges of backing up data on cloud services, we propose a new storage architecture, RAIC, which uses the cloud storage service like a disk and make them into a RAID-like system to provide the users with a high availability storage service. We have designed and implemented a prototype system for RAIC. With the evaluation of the system, we find that RAIC performs efficient. The upload performance is about 90.6% of the ideal upload bandwidth and 74.2% for the download performance. © 2012 Springer-Verlag. Source


Mao H.,State Key Laboratory of High Performance Computing | Mao H.,National University of Defense Technology | Zhang H.,State Key Laboratory of High Performance Computing | Zhang H.,National University of Defense Technology | And 7 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

In our daily life, people increasingly use multiple machines to do their daily work. As platform switching and file modification are so frequently that a way for file synchronization across multiple machines is required to make the files in synchronized. In this paper, we propose EaSync, a transparent file synchronization service across multiple machines. EaSync proposes several key technologies for file synchronization oriented service, including a timestamp based synchronization protocol, an enhanced deduplication algorithm DS-Dedup. We implement and evaluate the EaSync prototype system. As the result shown, EaSync outperforms other synchronization system in operation latency and other metrics. © IFIP International Federation for Information Processing 2012. Source


Wang C.,National University of Defense Technology | Wang C.,State Key Laboratory of High Performance Computing | Lu Y.,National University of Defense Technology | Lu Y.,State Key Laboratory of High Performance Computing | And 4 more authors.
Proceedings of 2015 IEEE International Conference on Computer and Communications, ICCC 2015 | Year: 2015

Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing and so on. Graph traversal on single nodes has been well studied and optimized on modern CPU architectures. Now, heterogeneous computing is becoming more and more popular and CPU+MIC is a typical heterogeneous architectures. The Intel MIC (Many Integrated Core) has up to 57 cores and hasn't been fully evaluated for graph traversal. When use a MIC to traverse a graph, the MIC may suffer from loading imbalance for the reason that the degree of vertexes in a graph may differs very much, which can degrade system performance. So in this paper, an algorithmic design and optimization techniques are presented to load balancing in MIC. About the optimization design, the main idea is that treat the vertexes with big degree and the vertexes with small degree separately. For this reason, some adjustments will be made to existing algorithms and data structures. It has achieved almost big performance improvements over the BFS algorithm without loading balancing in MIC as shown in section VI. We believe that this novel algorithm can be successfully applied to a broader class of graph algorithms with many MIC cores. © 2015 IEEE. Source

Discover hidden collaborations