Exascale Computing Research Center

France

Exascale Computing Research Center

France
SEARCH FILTERS
Time filter
Source Type

Didelot S.,Exascale Computing Research Center | Didelot S.,University of Versailles | Carribault P.,Exascale Computing Research Center | Carribault P.,University of Versailles | And 6 more authors.
Computing | Year: 2014

With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2. © 2013 Springer-Verlag Wien.


Maheo A.,Exascale Computing Research Center | Koliai S.,Exascale Computing Research Center | Carribault P.,Exascale Computing Research Center | Carribault P.,CEA DAM Ile-de-France | And 3 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

The advent of multicore processors advocates for a hybrid programming model like MPI+OpenMP. Therefore, OpenMP runtimes require solid performance from a small number of threads (one MPI task per socket, OpenMP inside each socket) to a large number of threads (one MPI task per node, OpenMP inside each node). To tackle this issue, we propose a mechanism to improve performance of thread synchronization with a large spectrum of threads. It relies on a hierarchical tree traversed in a different manner according to the number of threads inside the parallel region. Our approach exposes high performance for thread activation (parallel construct) and thread synchronization (barrier construct). Several papers study hierarchical structures to launch and synchronize OpenMP threads [1, 2]. They tested tree-based approaches to distribute and synchronize threads, but they do not explore mixed hierarchical solutions. © 2012 Springer-Verlag.


Didelot S.,Exascale Computing Research Center | Didelot S.,University of Versailles | Carribault P.,Exascale Computing Research Center | Carribault P.,CEA DAM Ile-de-France | And 4 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an Exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. © 2012 Springer-Verlag.


Kashnikov Y.,University of Versailles | Kashnikov Y.,Exascale Computing Research Center | Beyler J.C.,Intel Corporation | Beyler J.C.,Exascale Computing Research Center | And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Software engineers are highly dependent on compiler technology to create efficient programs. Optimal execution time is currently the most important criteria in the HPC field; to achieve this the user applies the common compiler option -O3. The following paper extensively tests the other performance options available and concludes that, although old compiler versions could benefit from compiler flag combinations, modern compilers perform admirably at the commonly used -O3 level. The paper presents the Universal Learning Machine (ULM) framework, which combines different tools together to predict the best flags from data gathered offline. The ULM framework evaluates three hundred kernels extracted from 144 benchmark applications. It automatically processes more than ten thousand compiler flag combinations for each kernel. In order to perform a complete study, the experimental setup includes three modern mainstream compilers and four different architectures. For 62% of kernels, the optimal flag is the generic optimization level -O3. For the remaining 38% of kernels, an extension to the ULM framework allows a user to instantly obtain the optimal flag combination, using a static prediction method. The prediction method examines four known machine learning algorithms, Nearest Neighbor, Stochastic Gradient Descent, and Support Vector Machines (SVM). ULM used SVM for the best results of a 92% accuracy rate for the considered kernels. © Springer-Verlag Berlin Heidelberg 2013.

Loading Exascale Computing Research Center collaborators
Loading Exascale Computing Research Center collaborators