Rennes, France
Rennes, France
SEARCH FILTERS
Time filter
Source Type

Schlutter M.,Jülich Research Center | Philippen P.,Jülich Research Center | Morin L.,CAPS Entreprise | Geimer M.,Jülich Research Center | Mohr B.,Jülich Research Center
Advances in Parallel Computing | Year: 2014

In heterogeneous environments with multi-core systems and accelerators, programming and optimizing large parallel applications turns into a time-intensive and hardware-dependent challenge. To assist application developers in this process, a number of tools and high-level compilers have been developed. Directive-based programming models such as HMPP and OpenACC provide abstractions over low-level GPU programming models, such as CUDA or OpenCL. The compilers developed by CAPS automatically transform the pragma-annotated application code into low-level code, thereby allowing the parallelization and optimization for a given accelerator hardware. To analyze the performance of parallel applications, multiple partners in Germany and the US jointly develop the community measurement infrastructure Score-P. Score-P gathers performance execution profiles, which can be presented and analyzed within the CUBE result browser, and collects detailed event traces to be processed by post-mortem analysis tools such as Scalasca and Vampir. In this paper we present the integration and combined use of Score-P and the CAPS compilers as one approach to efficiently parallelize and optimize codes. Specifically, we describe the PHMPP profiling interface, it's implementation in Score-P, and the presentation of preliminary results in CUBE. © 2014 The authors and IOS Press.


Bodin F.,CAPS Entreprise
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

Directive-based programming models are a pragmatic way of adapting legacy codes to heterogeneous many-cores such as CPUs coupled with GPUs. They provide programmers an abstracted and portable interface for developing many-core applications. The directives are used to express parallel computation, data transfers between the CPU and the GPU memories and code tuning hints. The challenge for such environment is to achieve high programming productivity and at the same time provide performance portability across hardware platforms. In this presentation we give an overview the state of the art of directives based parallel programming environments for many-core accelerators. In particular, we describe OpenACC (http://www.openacc-standard.org/), an initiative from CAPS, CRAY, NVIDIA and PGI that provides a new open parallel programming standard for C, C++ and Fortran languages. We show how tuning can be performed in such programming approach and specifically address numerical library inter-operability issues. © Springer-Verlag Berlin Heidelberg 2012.


Baker M.,Oak Ridge National Laboratory | Pophale S.,University of Houston | Vasnier J.-C.,CAPS Entreprise | Jin H.,NASA | Hernandez O.,Oak Ridge National Laboratory
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming on a cluster), using OpenMP to spawn threads on a node and communication libraries like MPI to communicate across nodes. As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogenous hybrid models will be needed to effectively leverage the new hardware. In this paper we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library for onesided communication between nodes. We use the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node. We evaluate the performance of the benchmark and discuss our experiences during the development of the OpenSHMEM+OpenACC hybrid program. © 2014 Springer International Publishing Switzerland.


Trademark
CAPS Entreprise | Date: 2012-03-20

Computer software for many-core programming; computer software development tools for developing and deploying parallel applications. Retail store services featuring computer software; online retail store services featuring computer software for developing and deploying parallel applications; wholesale store services for computer software; wholesale ordering services featuring computer software for developing and deploying parallel applications. Computer training, namely, training users in the use of computer software for developing and deploying parallel applications; arranging and conducting of colloquiums, conferences, congresses, seminars, arranging and conducting of training workshops in the fields of parallel computing. Design and development of computer hardware and software; design and development of computer hardware and software for developing and deploying parallel applications; development design, installation, maintenance, updating or rental services for computer software; development design, installation, maintenance, updating or rental services for computer software for developing and deploying parallel applications; computer programming services; conversion of data and computer programs, other than physical conversion; research and development of new computer products for others; technological and scientific monitoring of network services; technology advice relating to computer software, relating to developing and deploying parallel applications and in the field of improving the performance of multi-core processors.


Trademark
CAPS Entreprise | Date: 2012-03-20

Computer software for many-core programming; computer software development tools for developing and deploying parallel applications. Retail store services featuring computer software; online retail store services featuring computer software for developing and deploying parallel applications; wholesale store services for computer software; wholesale ordering services featuring computer software for developing and deploying parallel applications. Computer training; training users in the use of computer software for developing and deploying parallel applications; arranging and conducting of colloquiums, conferences, congresses, seminars, arranging and conducting of training workshops in the fields of parallel computing. Design and development of computer hardware and software; design and development of computer hardware and software for developing and deploying parallel applications; development design, installation, maintenance, updating or rental services for computer software; development design, installation, maintenance, updating or rental services for computer software for developing and deploying parallel applications; computer programming services; conversion of data and computer programs other than physical conversion; research and development of new computer products for others; technological and scientific monitoring of network systems; technology advice relating to computer software, relating to developing and deploying parallel applications and in the field of improving the performance of multi-core processors.


Grant
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2009.8.1 | Award Amount: 8.21M | Year: 2010

Dataflow parallelism is key to reach power efficiency, reliability, efficient parallel programmability, scalability, data bandwidth. In this project we propose dataflow both at task level and inside the\nthreads, to offload and manage accelerated codes, to localize the computation, for managing the fault information with appropriate protocols, to easily migrate code to the available/working components\nand to respect the power/performance/temperature/reliability envelope, to efficiently handle the parallelism and have an easy and powerful execution model, to produce a more predictable behavior.\nWhile parallel systems have been around for many years, they were usually programmed and tuned by experts. In the future large scale systems will be widely available and therefore exploiting efficiently\nthe available parallelism will have to be easy enough to be accessible by the common user. Traditional programming models are either not very efficient for every application (message passing) or difficult to\nscale (shared memory). In order to address the programmability challenge we propose the use of a compiler directive based model to support an underlying dataflow-based thread execution that is known to exploit well the available parallelism and to efficiently move around large amounts of data. In particular we propose to use a model that offers\ndataflow scheduling of parallel execution threads. Combining multithreading with dataflow allows to exploit the available parallelism without the overheads of the original dataflow techniques.\nThe multithreading dataflow model is expected to perform well for a number of classes of applications.\nAn important contribution is provided by prof. Gaos team, who has been developing dataflow concepts for decades and has joined the TERAFLUX project after its initial phase.\nTERAFLUX is now bringing together top experts in dataflow in both continents Europe and Americas, with the aim to reach the higher goal of demonstrating for the first time the efficiency dataflow concept for the Exascale parallel computers of the 2020 and beyond.


Trademark
CAPS Entreprise | Date: 2011-12-21

Computer software (recorded computer programs); computer software for developing and deploying parallel applications. Retail services of computer software; retail sales services for computer software for developing and deploying parallel applications; wholesale sales services for computer software; wholesale sales services for computer software for developing and deploying parallel applications. Computer and software design and development services; design and development services for computers and computer software for developing and deploying parallel applications; services for software design, development, installation, maintenance, updating and rental; services for the design (development), installation, maintenance, updating or rental of computer software for developing and deploying parallel applications; computer programming; research and development of new computer products for third parties; publishing of computer programs.


Grant
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2011.10.2 | Award Amount: 1.60M | Year: 2011

The objective of APOS-EU is to develop optimised versions of scientific and industrial codes which are scalable and portable across heterogeneous and homogeneous architectures. Codes will be chosen from the areas of seismic processing, magneto-hydrodynamics, percolation, molecular modelling and CFD. Scalability to thousands of cores will be a principal goal of the work. In parallel with the development of the codes, prototype tools, starting from the current state of the art, will be interactively developed, deployed and refined. The proposal falls within Part C of the call Objective ICT-2009.10.2 EU-Russia Research and Development Cooperation. This proposal is a small, but important step on the road to innovative, advanced simulations and tools to support their development. Real-world applications will be used to test and evaluate methodologies for developing massively parallel, highly scalable applications and for testing relevant tools. This will provide an important reference to code owners both ISVs and those with in-house codes. It will provide important feedback to tool developers on the requirements of porting real applications to the new generations of machine. The successful outcome of the project will constitute an important advance in the state of the art and will have immediate industrial and economic impact. The work in APOS-EU will be complemented by that in the Russian proposal APOS-RU which addresses two important application areas namely high-resolution 3-D seismic processing and atomic simulation codes based on many-body interatomic potentials. Both these areas create opportunities for common work between the EU and Russian consortia on the development of algorithms and on the use and development of tools for parallelisation.


Grant
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2011.3.4 | Award Amount: 3.08M | Year: 2011

Performance analysis and tuning is an important step in programming multicore-based parallel architectures. While performance analysis tools exist that help the developer in analyzing the application performance, these tools do not give any recommendations how to tune the code. AutoTune will extend Periscope, an automatic online and distributed performance analysis tool developed by Technische Universitt Mnchen, with automatic online tuning plugins for performance and energy efficiency tuning. The resulting Periscope Tuning Framework will be able to tune serial and parallel codes with/without GPU kernels and will return tuning recommendations that can be integrated into the production version of the code. The whole tuning process, consisting of automatic performance analysis and automatic tuning, will be executed online, i.e., during a single run of the application.\nThe research results of AutoTune will be integrated into a commercial development environment of a European SME and validated with real-world codes. Results will be widely disseminated through high-quality publications, workshops and conferences, and the large user-base of a computing center and will influence teaching activities of the academic partners.\nThe consortium unites European experts and comprises world-class universities, a major European supercomputing center, an innovative SME, as well as a major IT company, and has the required expertise to accomplish the aims of AutoTune.


Loading CAPS Entreprise collaborators
Loading CAPS Entreprise collaborators