Guo H.,Peking University |
Zhang J.,Peking University |
Liu R.,Peking University |
Liu L.,Peking University |
And 4 more authors.
IEEE Transactions on Visualization and Computer Graphics | Year: 2014
When computing integral curves and integral surfaces for large-scale unsteady flow fields, a major bottleneck is the widening gap between data access demands and the available bandwidth (both I/O and in-memory). In this work, we explore a novel advection-based scheme to manage flow field data for both efficiency and scalability. The key is to first partition flow field into blocklets (e.g. cells or very fine-grained blocks of cells), and then (pre)fetch and manage blocklets on-demand using a parallel key-value store. The benefits are (1) greatly increasing the scale of local-range analysis (e.g. source-destination queries, streak surface generation) that can fit within any given limit of hardware resources; (2) improving memory and I/O bandwidth-efficiencies as well as the scalability of naive task-parallel particle advection. We demonstrate our method using a prototype system that works on workstation and also in supercomputing environments. Results show significantly reduced I/O overhead compared to accessing raw flow data, and also high scalability on a supercomputer for a variety of applications. © 2014 IEEE.
Agency: European Commission | Branch: FP7 | Program: CSA | Phase: ICT-2011.3.4 | Award Amount: 447.15K | Year: 2012
In 2010, the TOP500 project, which ranks and details the 500 (most powerful known computer systems in the world since the year of 1993, announced that the worlds most powerful computer system is Tianhe 1A in China (http://en.wikipedia.org/wiki/TOP500). This project aims at establishing a strategic collaboration with the host and developer of this computer system in China to explore a range of research issues, which can be highlighted as: (i) further test and evaluation with complex computing tasks, especially those in the areas of modelling, simulation, visualization and imaging etc, and hence identify a range of research challenges for further development in the area of computing systems as well as their applications; (ii) discussion with series of targeted workshops and seminars to explore and generate ideas in further developing super computer architectures, algorithms, configurations, and any other important issues across the boundaries of software engineering, distributed computing, cloud computing, and grid computing etc. (iii) exchange visits and personnel in developing the discussed ideas into project proposals and research programmes. (iv) joint publications and other dissemination activities; (v) establishing long-term collaborations in addressing ambitious and challenging research issues. The SCC-Computing has drawn a strong consortium with complementary expertise and multi-disciplinary research know-how to ensure successful delivery of this project, leading to fruitful discussion and initiation of new ideas for further research on super-computing systems.
Jiang Y.-H.,Duke University |
Yuen R.K.C.,Applied Genomics |
Jin X.,BGI Shenzhen |
Jin X.,Children's Hospital of Philadelphia |
And 43 more authors.
American Journal of Human Genetics | Year: 2013
Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. © 2013 The Authors.
Jia J.,Chinese Academy of Agricultural Sciences |
Zhao S.,BGI Shenzhen |
Zhao S.,Chinese University of Hong Kong |
Kong X.,Chinese Academy of Agricultural Sciences |
And 46 more authors.
Nature | Year: 2013
About 8,000 years ago in the Fertile Crescent, a spontaneous hybridization of the wild diploid grass Aegilops tauschii (2n = 14; DD) with the cultivated tetraploid wheat Triticum turgidum (2n = 4x = 28; AABB) resulted in hexaploid wheat (T. aestivum; 2n = 6x = 42; AABBDD). Wheat has since become a primary staple crop worldwide as a result of its enhanced adaptability to a wide range of climates and improved grain quality for the production of baker's flour. Here we describe sequencing the Ae. tauschii genome and obtaining a roughly 90-fold depth of short reads from libraries with various insert sizes, to gain a better understanding of this genetically complex plant. The assembled scaffolds represented 83.4% of the genome, of which 65.9% comprised transposable elements. We generated comprehensive RNA-Seq data and used it to identify 43,150 protein-coding genes, of which 30,697 (71.1%) were uniquely anchored to chromosomes with an integrated high-density genetic map. Whole-genome analysis revealed gene family expansion in Ae. tauschii of agronomically relevant gene families that were associated with disease resistance, abiotic stress tolerance and grain quality. This draft genome sequence provides insight into the environmental adaptation of bread wheat and can aid in defining the large and complicated genomes of wheat species. © 2013 Macmillan Publishers Limited. All rights reserved.
Wei L.,National University of Defense Technology |
Liu G.,National University of Defense Technology |
Shao Y.,National Supercomputer Center in Tianjin |
Liu J.,National Supercomputer Center in Tianjin |
Zuo Y.,National Supercomputer Center in Tianjin
2016 International Conference on Computer Communication and Informatics, ICCCI 2016 | Year: 2016
For the challenges of redundancy, multi-dimension, complex and heterogeneous in medical documents ,and to solve the problem that the value hidden in the huge amounts of medical document-data can't be mined, this paper proposed a system called MSPM based on NOSQL and MapReduce. Through storage of key-value pairs,complex and heterogeneous datas are summed up in a unified and convenient format of transaction for Apriori. Then Apriori is executed in parallel through MapReduce.At last,with the strategies of generating all the candidate sets non-recursively and constraint count for candidate sets of interest, it can solve the problem of low speed, high overhead and poor effectiveness for Apriori algorithm in the application of medical data. Testing results has shown the algorithm of optimization is available. © 2016 IEEE.
Zhu X.,National University of Defense Technology |
Zhu X.,National Supercomputer Center in Tianjin |
Liu X.,National University of Defense Technology |
Meng X.,National Supercomputer Center in Tianjin |
Feng J.,National Supercomputer Center in Tianjin
2011 International Conference on Electrical and Control Engineering, ICECE 2011 - Proceedings | Year: 2011
In this study, we test and analyze the performance of Gyrokinetic Torodial Code(GTC) program. According to the analysis results, we port GTC's compute-intensive subroutines to GPU and speed up them on the CPUGPU heterogeneous architecture of TH-1A supercomputer. Some optimization strategies are developed in this process, for example, subroutines are integrated to reduce the data transfer between host and device, GPU memory access is optimized to reduce the access latency and static keyword is designed before arrays' declaration to avoid unnecessary address allocation and data copy. Experiment results show that the performance of the subroutines ported to GPU is improved evidently, which is between 6 and 8 times, and the total performance of GTC could be improved by 3 to 4 times. © 2011 IEEE.
Wang C.,Tianjin University |
Yu C.,Tianjin University |
Tang S.,Tianjin University |
Xiao J.,Tianjin University |
And 2 more authors.
Parallel Computing | Year: 2016
Dynamic programming is an important technique widely used in many scientific applications. Due to the massive volume of applications’ data in practice, parallel and distributed DP is a must. However, writing a parallel and distributed DP program is difficult and error-prone because of its intrinsically strong data dependency. In this paper, we present DPX10, a DAG-based distributed X10 framework aiming at simplifying the parallel programming for DP applications. DPX10 enables users to write highly efficient parallel DP programs without much effort. For DPX10 programming, users only need to do two things: 1) Instantiating a DAG pattern by indicating the dependency between vertices of the DAG; 2) Implementing the DP application's logic in the compute method of the vertices. DPX10 provides eight commonly used DAG patterns and a simple API to allow users to customize their own DAG patterns. All the tiresome work of DP parallelization including DAG distribution, tasks scheduling, and tasks communication are hidden from users and covered by DPX10. Moreover, DPX10 is fault-tolerant and has a mechanism to handle the problem of straggler tasks, which run much slower than other tasks due to unexpected resource contention. Finally, we use four DP applications with up to 2 billion vertices running on 240 cores to demonstrate the simplicity, efficiency, and scalability of our proposed framework. © 2016 Elsevier B.V.
Dong W.,National University of Defense Technology |
Liu G.,National University of Defense Technology |
Liu G.,National Supercomputer Center in Tianjin |
Yu J.,National University of Defense Technology |
Zuo Y.,National University of Defense Technology
2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015 | Year: 2015
The performance of storage subsystem of super-computers can not meet the demands of complex applications running on them. One of its major causes is that the bandwidth of storage hardware has not been utilized efficiently due to the complex and changing application I/O behavior. Therefore, I/O characterization tools are vital to application development and orchestration of storage system. This paper proposes an I/O characterization tool called FTracer. It captures I/O traces and performs traces analysis at runtime. In order to provide more flexible analysis, this FTracer allows users to vary the analysis instances at runtime. This mechanism ensures users get what exactly they want about the I/O characteristics of their applications when applications are running. In this work, we characterize MADbench2 benchmark to demonstrate the ability of FTracer. © 2015 IEEE.
Fei Z.,National University of Defense Technology |
Guang-Ming L.,National Supercomputer Center in Tianjin
Proceedings - 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing, ICCP 2013 | Year: 2013
Traffic congestion has been the most intractable problem for most of countries' governments. It is crucial to solve the traffic congestion that traffic departments achieve the real-time information of traffic condition and release it to the public. The front cameras of buses record the traffic conditions of the main roads of a city. The real-time traffic conditions can be achieved through analyzing the surveillance video of this camera. Design the model to collecting images from the camera, and preprocess these images to make subsequent processing easier. Propose the idea of the average background, and the method of background subtraction is used to detect vehicles. Propose the calculation method of the pixel's weight of image under the perspective law and of the congestion index on the basis of the vehicle pixel statistics, and divide traffic congestion into levels by the index. In the end, experiment is designed to validate the methods. © 2013 IEEE.
Liu C.,Tianjin University |
Shu Y.,Tianjin University |
Yang O.,University of Ottawa |
Xia Z.,National Supercomputer Center in Tianjin |
Xia R.,Tianjin University
Wireless Personal Communications | Year: 2013
A stable and reliable routing mechanism for vehicular ad hoc networks (VANETs) is an important step toward the provision of long data transmission applications, such as file sharing and music download. Traditional mobile ad hoc network (MANET) routing protocols are not suitable for VANET because the mobility model and environment of VANET are different from those of traditional MANET. To solve this problem, we proposed a new stable routing algorithm, called stable directional forward routing. The novelty of the proposed routing protocol is its combining direction broadcast and path duration prediction into ad hoc on-demand distance vector routing protocols, which including: (1) Nodes in VANET are grouped based on the position, only nodes in a given direction range participating in the route discovery process to reduce the frequency of flood requests, (2) Route selection is based on the link duration while not the hops or other metrics to increase the path duration, (3) Route discovery is executed before the path expiration in order to decrease the end to end delay. The performance of the new scheme is evaluated through extensive simulations with Qualnet. Simulation results indicate the benefits of the proposed routing strategy in terms of decreasing routing control packet, reducing the number of link-breakage events, improving the packet delivery ratio and decreasing the end-to-end delay. © 2013 Springer Science+Business Media New York.