Time filter

Source Type

Santiago de Compostela, Spain

Gonzalez-Dominguez J.,University of La Coruna | Martin M.J.,University of La Coruna | Taboada G.L.,University of La Coruna | Tourino J.,University of La Coruna | And 3 more authors.
Concurrency Computation Practice and Experience | Year: 2012

The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message-passing-based parallel numerical library, demonstrating good scalability and efficiency. Copyright © 2012 John Wiley & Sons, Ltd.

Pichel J.C.,Galicia Supercomputing Center | Heras D.B.,University of Santiago de Compostela | Cabaleiro J.C.,University of Santiago de Compostela | GarciaLoureiro A.J.,University of Santiago de Compostela | Rivera F.F.,University of Santiago de Compostela
International Journal of High Performance Computing Applications | Year: 2010

Irregular codes are present in many scientific applications, such as finite element simulations. In these simulations the solution of large sparse linear equation systems is required, which are often solved using iterative methods. The main kernel of the iterative methods is the sparse matrix-vector multiplication which frequently demands irregular data accesses. Therefore, techniques that increase the performance of this operation will have a great impact on the global performance of the iterative method and, as a consequence, on the simulations. In this paper a technique for improving the locality of sparse matrix codes is presented. The technique consists of reorganizing the data guided by a locality model instead of restructuring the code or changing the sparse matrix storage format. We have applied our proposal to different iterative methods provided by two standard numerical libraries. Results show an impact on the overall performance of the considered iterative method due to the increase in the locality of the sparse matrix-vector product. Noticeable reductions in the execution time have been achieved both in sequential and in parallel executions. This positive behavior allows the reordering technique to be successfully applied to real problems. We have focused on the simulation of semiconductor devices and in particular on the BIPS3D simulator. The technique was integrated into the simulator. Both sequential and parallel executions have been analyzed extensively in this paper. Noticeable reductions in the execution time required by the simulations are observed when using our reordered matrices in comparison with the original simulator. © 2010 The Author(s).

Mallon D.A.,Julich Research Center | Taboada G.L.,University of La Coruna | Teijeiro C.,University of La Coruna | Gonzalez-Dominguez J.,University of La Coruna | And 2 more authors.
Cluster Computing | Year: 2014

The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures. © 2014, Springer Science+Business Media New York.

Teijeiro C.,University of La Coruna | Taboada G.L.,University of La Coruna | Tourino J.,University of La Coruna | Doallo R.,University of La Coruna | And 3 more authors.
Journal of Computer Science and Technology | Year: 2013

Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing. © 2013 Springer Science+Business Media New York & Science Press, China.

Cotelo C.,Galicia Supercomputing Center | Gomez A.,Galicia Supercomputing Center | Lopez J.I.,Galicia Supercomputing Center | Mera D.,University of Santiago de Compostela | And 3 more authors.
Future Generation Computer Systems | Year: 2010

Retelab is a virtual laboratory for the Oceanographic research community. It is supported by a Grid infrastructure and its main objective is to provide an easy and useful tool for oceanographers, where computer skills are not an obstacle. To achieve these goals, Retelab includes improved versions of portal and Grid technologies related to security, data access, and job management. A solution based on a Role Access Management Model has been built for user access and registration, looking for a balance between simplicity and robustness. The sharing and discovery of scientific data is accomplished using a virtual database focused on metadata and designed specifically to store geospatial information. Finally, a comfortable and transparent procedure to submit and to monitor jobs has been developed. It is based on the integration and adaptation of the GridWay metascheduler to the multiuser portal environment in such a way that a single UNIX account can use several proxy certificates. The Virtual Laboratory has been tested by the implementation and deployment of several oceanographic applications. © 2010 Elsevier B.V. All rights reserved.

Discover hidden collaborations