National Supercomputer Center in Tianjin
National Supercomputer Center in Tianjin
Yu J.,National University of Defense Technology |
Liu G.,National University of Defense Technology |
Liu G.,National Supercomputer Center in Tianjin |
Dong W.,National University of Defense Technology |
Li X.,National University of Defense Technology
Proceedings - 3rd IEEE International Conference on Big Data Security on Cloud, BigDataSecurity 2017, 3rd IEEE International Conference on High Performance and Smart Computing, HPSC 2017 and 2nd IEEE International Conference on Intelligent Data and Security, IDS 2017 | Year: 2017
Nowadays high performance computers (HPC) are used to solve increasingly complex problems and process larger amounts of data. The growing computational requirements of applications can be met by utilizing more compute nodes. However, the average I/O performance a compute node can utilize is reduced with increased number of nodes. The performance gap between computation and I/O has long been a primary issue impacting application performance. Distributed memory cache has been proposed to narrow the performance gap by caching data in the memory of multiple compute nodes. However, former approaches didn't fully optimize the performance of accessing locally cached data. We design and implement a locality-enhanced distributed memory cache (LeCache) to address such problem. LeCache separates the location of metadata and data, with which it enables data to be preferentially cached in local memory. The proposed metadata caching strategy further minimizes the overhead of querying metadata remotely. We conduct extensive evaluation with IOR and BTIO in Tianhe-1A. The results show that LeCache has significant performance advantage under various kinds of workloads. © 2017 IEEE.
Cai G.,National University of Defense Technology |
Hu W.,National University of Defense Technology |
Liu G.,National University of Defense Technology |
Liu G.,National Supercomputer Center in Tianjin |
And 3 more authors.
International Conference on Advanced Communication Technology, ICACT | Year: 2017
With supercomputer system scaling up, the performance gap between compute and storage system increases dramatically. The traditional speedup only measures the performance of compute system. In this paper, we firstly propose the speedup metric taking into account the I/O constraint. The new metric unifies the computing and I/O performance, and evaluates practical speedup of parallel application under the limitation of I/O system. Furthermore, this paper classifies and analyzes existing parallel systems according to the proposed speedup metric, and makes suggestions on system design and application optimization. Based on the storage speedup, we also generalize these results into a general storage speedup by considering not only speedup but also costup. Finally, we provide the analysis of these new speedup metrics by case studies. The storage speedup reflects the degree of parallel application scalability affected by performance of storage system. The results indicate that the proposed speedups for parallel applications are effective metrics. © 2017 Global IT Research Institute - GiRI.
Wu L.-K.,Jiangsu University |
Wu L.-K.,National Supercomputer Center in Tianjin |
Meng X.-F.,Jiangsu University |
Meng X.-F.,National Supercomputer Center in Tianjin
Physical Review D | Year: 2017
Result on locations of the tricritical points of Nf=2 lattice QCD with imaginary chemical potential is presented. Simulations are carried out with Symanzik improved gauge action and Asqtad fermion action. With imaginary chemical potential iμI=iπT, previous studies show that the Roberge-Weiss (RW) transition endpoints are triple points at both large and small quark masses, and second order transition points at intermediate quark masses. The triple and second order endpoints are separated by two tricritical ones. Our simulations are carried out at 7 values of quark mass am ranging from 0.024 to 0.070 on lattice volume 123×4, 163×4, 203×4. The susceptibility and Binder cumulant of the imaginary part of the Polyakov loop are employed to determine the nature of RW transition endpoints. The simulations suggest that the two tricritical points are within the range 0.024-0.026 and 0.040-0.050, respectively. © 2017 American Physical Society.
Guo H.,Peking University |
Zhang J.,Peking University |
Liu R.,Peking University |
Liu L.,Peking University |
And 4 more authors.
IEEE Transactions on Visualization and Computer Graphics | Year: 2014
When computing integral curves and integral surfaces for large-scale unsteady flow fields, a major bottleneck is the widening gap between data access demands and the available bandwidth (both I/O and in-memory). In this work, we explore a novel advection-based scheme to manage flow field data for both efficiency and scalability. The key is to first partition flow field into blocklets (e.g. cells or very fine-grained blocks of cells), and then (pre)fetch and manage blocklets on-demand using a parallel key-value store. The benefits are (1) greatly increasing the scale of local-range analysis (e.g. source-destination queries, streak surface generation) that can fit within any given limit of hardware resources; (2) improving memory and I/O bandwidth-efficiencies as well as the scalability of naive task-parallel particle advection. We demonstrate our method using a prototype system that works on workstation and also in supercomputing environments. Results show significantly reduced I/O overhead compared to accessing raw flow data, and also high scalability on a supercomputer for a variety of applications. © 2014 IEEE.
Agency: European Commission | Branch: FP7 | Program: CSA | Phase: ICT-2011.3.4 | Award Amount: 447.15K | Year: 2012
In 2010, the TOP500 project, which ranks and details the 500 (most powerful known computer systems in the world since the year of 1993, announced that the worlds most powerful computer system is Tianhe 1A in China (http://en.wikipedia.org/wiki/TOP500). This project aims at establishing a strategic collaboration with the host and developer of this computer system in China to explore a range of research issues, which can be highlighted as: (i) further test and evaluation with complex computing tasks, especially those in the areas of modelling, simulation, visualization and imaging etc, and hence identify a range of research challenges for further development in the area of computing systems as well as their applications; (ii) discussion with series of targeted workshops and seminars to explore and generate ideas in further developing super computer architectures, algorithms, configurations, and any other important issues across the boundaries of software engineering, distributed computing, cloud computing, and grid computing etc. (iii) exchange visits and personnel in developing the discussed ideas into project proposals and research programmes. (iv) joint publications and other dissemination activities; (v) establishing long-term collaborations in addressing ambitious and challenging research issues. The SCC-Computing has drawn a strong consortium with complementary expertise and multi-disciplinary research know-how to ensure successful delivery of this project, leading to fruitful discussion and initiation of new ideas for further research on super-computing systems.
Jiang Y.-H.,Duke University |
Yuen R.K.C.,Applied Genomics |
Jin X.,BGI Shenzhen |
Jin X.,Children's Hospital of Philadelphia |
And 43 more authors.
American Journal of Human Genetics | Year: 2013
Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. © 2013 The Authors.
Jia J.,Chinese Academy of Agricultural Sciences |
Zhao S.,BGI Shenzhen |
Zhao S.,Chinese University of Hong Kong |
Kong X.,Chinese Academy of Agricultural Sciences |
And 46 more authors.
Nature | Year: 2013
About 8,000 years ago in the Fertile Crescent, a spontaneous hybridization of the wild diploid grass Aegilops tauschii (2n = 14; DD) with the cultivated tetraploid wheat Triticum turgidum (2n = 4x = 28; AABB) resulted in hexaploid wheat (T. aestivum; 2n = 6x = 42; AABBDD). Wheat has since become a primary staple crop worldwide as a result of its enhanced adaptability to a wide range of climates and improved grain quality for the production of baker's flour. Here we describe sequencing the Ae. tauschii genome and obtaining a roughly 90-fold depth of short reads from libraries with various insert sizes, to gain a better understanding of this genetically complex plant. The assembled scaffolds represented 83.4% of the genome, of which 65.9% comprised transposable elements. We generated comprehensive RNA-Seq data and used it to identify 43,150 protein-coding genes, of which 30,697 (71.1%) were uniquely anchored to chromosomes with an integrated high-density genetic map. Whole-genome analysis revealed gene family expansion in Ae. tauschii of agronomically relevant gene families that were associated with disease resistance, abiotic stress tolerance and grain quality. This draft genome sequence provides insight into the environmental adaptation of bread wheat and can aid in defining the large and complicated genomes of wheat species. © 2013 Macmillan Publishers Limited. All rights reserved.
Zhu X.,National University of Defense Technology |
Zhu X.,National Supercomputer Center in Tianjin |
Liu X.,National University of Defense Technology |
Meng X.,National Supercomputer Center in Tianjin |
Feng J.,National Supercomputer Center in Tianjin
2011 International Conference on Electrical and Control Engineering, ICECE 2011 - Proceedings | Year: 2011
In this study, we test and analyze the performance of Gyrokinetic Torodial Code(GTC) program. According to the analysis results, we port GTC's compute-intensive subroutines to GPU and speed up them on the CPUGPU heterogeneous architecture of TH-1A supercomputer. Some optimization strategies are developed in this process, for example, subroutines are integrated to reduce the data transfer between host and device, GPU memory access is optimized to reduce the access latency and static keyword is designed before arrays' declaration to avoid unnecessary address allocation and data copy. Experiment results show that the performance of the subroutines ported to GPU is improved evidently, which is between 6 and 8 times, and the total performance of GTC could be improved by 3 to 4 times. © 2011 IEEE.
Fei Z.,National University of Defense Technology |
Guang-Ming L.,National Supercomputer Center in Tianjin
Proceedings - 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing, ICCP 2013 | Year: 2013
Traffic congestion has been the most intractable problem for most of countries' governments. It is crucial to solve the traffic congestion that traffic departments achieve the real-time information of traffic condition and release it to the public. The front cameras of buses record the traffic conditions of the main roads of a city. The real-time traffic conditions can be achieved through analyzing the surveillance video of this camera. Design the model to collecting images from the camera, and preprocess these images to make subsequent processing easier. Propose the idea of the average background, and the method of background subtraction is used to detect vehicles. Propose the calculation method of the pixel's weight of image under the perspective law and of the congestion index on the basis of the vehicle pixel statistics, and divide traffic congestion into levels by the index. In the end, experiment is designed to validate the methods. © 2013 IEEE.
Liu C.,Tianjin University |
Shu Y.,Tianjin University |
Yang O.,University of Ottawa |
Xia Z.,National Supercomputer Center in Tianjin |
Xia R.,Tianjin University
Wireless Personal Communications | Year: 2013
A stable and reliable routing mechanism for vehicular ad hoc networks (VANETs) is an important step toward the provision of long data transmission applications, such as file sharing and music download. Traditional mobile ad hoc network (MANET) routing protocols are not suitable for VANET because the mobility model and environment of VANET are different from those of traditional MANET. To solve this problem, we proposed a new stable routing algorithm, called stable directional forward routing. The novelty of the proposed routing protocol is its combining direction broadcast and path duration prediction into ad hoc on-demand distance vector routing protocols, which including: (1) Nodes in VANET are grouped based on the position, only nodes in a given direction range participating in the route discovery process to reduce the frequency of flood requests, (2) Route selection is based on the link duration while not the hops or other metrics to increase the path duration, (3) Route discovery is executed before the path expiration in order to decrease the end to end delay. The performance of the new scheme is evaluated through extensive simulations with Qualnet. Simulation results indicate the benefits of the proposed routing strategy in terms of decreasing routing control packet, reducing the number of link-breakage events, improving the packet delivery ratio and decreasing the end-to-end delay. © 2013 Springer Science+Business Media New York.