Entity

Time filter

Source Type


Solca R.,ETH Zurich | Haidar A.,University of Tennessee at Knoxville | Tomov S.,University of Tennessee at Knoxville | Schulthess T.C.,ETH Zurich | And 3 more authors.
Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 | Year: 2012

The adoption of hybrid GPU-CPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit from the massive compute performance concentrated on a single node, hybrid GPU-CPU system. However, hybrid systems call for the development of new algorithms that efficiently exploit heterogeneity and massive parallelism of not just GPUs, but of multi/many-core CPUs as well. Addressing these demands, we developed a novel algorithm featuring innovative: Fine grained memory aware tasks, Hybrid execution/scheduling, and Increased computational intensity}. The resulting eigensolvers are state-of - The-art in HPC, significantly outperforming existing libraries. We describe the algorithm and analyze its performance impact on applications of interest when different fractions of eigenvectors are needed by the host electronic structure code. © 2012 IEEE. Source


Haidar A.,University of Tennessee at Knoxville | Tomov S.,University of Tennessee at Knoxville | Dongarra J.,University of Tennessee at Knoxville | Dongarra J.,Oak Ridge National Laboratory | And 4 more authors.
International Journal of High Performance Computing Applications | Year: 2014

The adoption of hybrid CPU-GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit from the massive computing power concentrated on a single-node, hybrid CPU-GPU system. However, hybrid systems call for the development of new algorithms that efficiently exploit heterogeneity and massive parallelism of not just GPUs, but of multicore/manycore CPUs as well. Addressing these demands, we developed a generalized eigensolver featuring novel algorithms of increased computational intensity (compared with the standard algorithms), decomposition of the computation into fine-grained memory aware tasks, and their hybrid execution. The resulting eigensolvers are state-of-the-art in high-performance computing, significantly outperforming existing libraries. We describe the algorithm and analyze its performance impact on applications of interest when different fractions of eigenvectors are needed by the host electronic structure code. © The Author(s) 2013. Source


Haidar A.,University of Tennessee at Knoxville | Solca R.,ETH Zurich | Gates M.,University of Tennessee at Knoxville | Tomov S.,University of Tennessee at Knoxville | And 5 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Today's high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges-starting from our algorithm design, kernel optimization and tuning, to our programming model-in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs. © 2013 Springer-Verlag. Source

Discover hidden collaborations