Schlutter M.,Julich Research Center |
Philippen P.,Julich Research Center |
Morin L.,CAPS Entreprise |
Geimer M.,Julich Research Center |
Mohr B.,Julich Research Center
Advances in Parallel Computing | Year: 2014
In heterogeneous environments with multi-core systems and accelerators, programming and optimizing large parallel applications turns into a time-intensive and hardware-dependent challenge. To assist application developers in this process, a number of tools and high-level compilers have been developed. Directive-based programming models such as HMPP and OpenACC provide abstractions over low-level GPU programming models, such as CUDA or OpenCL. The compilers developed by CAPS automatically transform the pragma-annotated application code into low-level code, thereby allowing the parallelization and optimization for a given accelerator hardware. To analyze the performance of parallel applications, multiple partners in Germany and the US jointly develop the community measurement infrastructure Score-P. Score-P gathers performance execution profiles, which can be presented and analyzed within the CUBE result browser, and collects detailed event traces to be processed by post-mortem analysis tools such as Scalasca and Vampir. In this paper we present the integration and combined use of Score-P and the CAPS compilers as one approach to efficiently parallelize and optimize codes. Specifically, we describe the PHMPP profiling interface, it's implementation in Score-P, and the presentation of preliminary results in CUBE. © 2014 The authors and IOS Press.
Bodin F.,CAPS Entreprise
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012
Directive-based programming models are a pragmatic way of adapting legacy codes to heterogeneous many-cores such as CPUs coupled with GPUs. They provide programmers an abstracted and portable interface for developing many-core applications. The directives are used to express parallel computation, data transfers between the CPU and the GPU memories and code tuning hints. The challenge for such environment is to achieve high programming productivity and at the same time provide performance portability across hardware platforms. In this presentation we give an overview the state of the art of directives based parallel programming environments for many-core accelerators. In particular, we describe OpenACC (http://www.openacc-standard.org/), an initiative from CAPS, CRAY, NVIDIA and PGI that provides a new open parallel programming standard for C, C++ and Fortran languages. We show how tuning can be performed in such programming approach and specifically address numerical library inter-operability issues. © Springer-Verlag Berlin Heidelberg 2012.
Wang M.,National University of Defense Technology |
Wang M.,Wuxi Institute of Technology |
Bodin F.,IRISA |
Bodin F.,CAPS Entreprise
Journal of Systems Architecture | Year: 2011
Advances in semiconductor technique enable multiple processor cores to be integrated into a single chip. Heterogeneous multiprocessor system-on-a-chip (MPSoC) becomes important platforms to accelerate applications. However, compilation techniques for memory management on MPSoCs still lag behind. This paper presents an automatic memory management framework to orchestrate the data movement between local memory and off-chip memory. In our framework, data alignment, hierarchically data distribution, communication generation, loop tiling, and loop splitting are employed. Moreover, a communication optimization approach is proposed to improve data reuse. These techniques can reduce off-chip memory access and exploit data locality. Experimental results on Cell BE show that our data management framework can generate efficient code for the program. © 2010 Elsevier B.V. All rights reserved.
Baker M.,Oak Ridge National Laboratory |
Pophale S.,University of Houston |
Vasnier J.-C.,CAPS Entreprise |
Jin H.,NASA |
Hernandez O.,Oak Ridge National Laboratory
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014
With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming on a cluster), using OpenMP to spawn threads on a node and communication libraries like MPI to communicate across nodes. As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogenous hybrid models will be needed to effectively leverage the new hardware. In this paper we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library for onesided communication between nodes. We use the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node. We evaluate the performance of the benchmark and discuss our experiences during the development of the OpenSHMEM+OpenACC hybrid program. © 2014 Springer International Publishing Switzerland.
Agency: Cordis | Branch: FP7 | Program: CP | Phase: ICT-2011.10.2 | Award Amount: 1.60M | Year: 2011
The objective of APOS-EU is to develop optimised versions of scientific and industrial codes which are scalable and portable across heterogeneous and homogeneous architectures. Codes will be chosen from the areas of seismic processing, magneto-hydrodynamics, percolation, molecular modelling and CFD. Scalability to thousands of cores will be a principal goal of the work. In parallel with the development of the codes, prototype tools, starting from the current state of the art, will be interactively developed, deployed and refined. The proposal falls within Part C of the call Objective ICT-2009.10.2 EU-Russia Research and Development Cooperation. This proposal is a small, but important step on the road to innovative, advanced simulations and tools to support their development. Real-world applications will be used to test and evaluate methodologies for developing massively parallel, highly scalable applications and for testing relevant tools. This will provide an important reference to code owners both ISVs and those with in-house codes. It will provide important feedback to tool developers on the requirements of porting real applications to the new generations of machine. The successful outcome of the project will constitute an important advance in the state of the art and will have immediate industrial and economic impact. The work in APOS-EU will be complemented by that in the Russian proposal APOS-RU which addresses two important application areas namely high-resolution 3-D seismic processing and atomic simulation codes based on many-body interatomic potentials. Both these areas create opportunities for common work between the EU and Russian consortia on the development of algorithms and on the use and development of tools for parallelisation.