Galicia Supercomputing Center

Santiago de Compostela, Spain

Galicia Supercomputing Center

Santiago de Compostela, Spain
Time filter
Source Type

Mallon D.A.,Jülich Research Center | Taboada G.L.,University of La Coruña | Teijeiro C.,University of La Coruña | Gonzalez-Dominguez J.,University of La Coruña | And 2 more authors.
Cluster Computing | Year: 2014

The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures. © 2014, Springer Science+Business Media New York.

Teijeiro C.,University of La Coruña | Taboada G.L.,University of La Coruña | Tourino J.,University of La Coruña | Doallo R.,University of La Coruña | And 3 more authors.
Journal of Computer Science and Technology | Year: 2013

Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing. © 2013 Springer Science+Business Media New York & Science Press, China.

Cotelo C.,Galicia Supercomputing Center | Gomez A.,Galicia Supercomputing Center | Lopez J.I.,Galicia Supercomputing Center | Mera D.,University of Santiago de Compostela | And 3 more authors.
Future Generation Computer Systems | Year: 2010

Retelab is a virtual laboratory for the Oceanographic research community. It is supported by a Grid infrastructure and its main objective is to provide an easy and useful tool for oceanographers, where computer skills are not an obstacle. To achieve these goals, Retelab includes improved versions of portal and Grid technologies related to security, data access, and job management. A solution based on a Role Access Management Model has been built for user access and registration, looking for a balance between simplicity and robustness. The sharing and discovery of scientific data is accomplished using a virtual database focused on metadata and designed specifically to store geospatial information. Finally, a comfortable and transparent procedure to submit and to monitor jobs has been developed. It is based on the integration and adaptation of the GridWay metascheduler to the multiuser portal environment in such a way that a single UNIX account can use several proxy certificates. The Virtual Laboratory has been tested by the implementation and deployment of several oceanographic applications. © 2010 Elsevier B.V. All rights reserved.

Pineiro A.,University of Santiago de Compostela | Pardo V.,University of Santiago de Compostela | Baldomir D.,University of Santiago de Compostela | Rodriguez A.,Galicia Supercomputing Center | And 3 more authors.
Journal of Physics Condensed Matter | Year: 2012

The chemical influence in the phase separation phenomenon that occurs in perovskite manganites is discussed by means of abinitio calculations. Supercells have been used to simulate a phase separated state, that occurs at Ca concentrations close to the localized itinerant crossover. We have first considered a model with two types of magnetic ordering coexisting within the same compound. This is not stable. However, a non-isotropic distribution of chemical dopants is found to be the ground state. This leads to regions in the system with different effective concentrations, that would always accompany the magnetic phase separation at the same nanometric scale, with hole-rich regions being more ferromagnetic in character and hole-poor regions being in the antiferromagnetic region of the phase diagram, as long as the system is close to a phase crossover. © 2012 IOP Publishing Ltd.

Fernandez Albor V.,University of Santiago de Compostela | Saborido J.,University of Santiago de Compostela | Gomez-Folgar F.,Galicia Supercomputing Center | Lopez Cacheiro J.,Galicia Supercomputing Center | Graciani Diaz R.,University of Barcelona
Proceedings - 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011 | Year: 2011

Grid is one option that researchers are already using to submit their scientific simulations. Several organizations such as CERN (European Organization for Nuclear Research) or EMBL (European Molecular Biology Laboratory) are currently using grid in order to run a large part of their simulation jobs. Nowadays the increasing availability of cloud resources its making the scientific community to shift focus from grid to cloud as a way that will allow them to extend the pool of resources where they can run their jobs. Unfortunately running scientific jobs in the cloud usually requires to start again from the beginning and learn how to use new interfaces. CERN's LHCb experiment has developed a software framework called DIRAC (Distributed Infrastructure with Remote Agent Control) which provides researchers with the perfect environment for running their jobs and get the results through a browser. CloudStack is an open source cloud platform, now owned by Citrix Systems, that allows building any type of cloud including public, private, and hybrid. It is the cloud management software selected in the FORMIGA CLOUD project to manage non-dedicated computer lab resources belonging to different Spanish universities. This article explains the work involved in the integration between CloudStack and the grid framework DIRAC. This integration will allow users to use cloud resources transparently through a common interface. © 2011 IEEE.

Pichel J.C.,Galicia Supercomputing Center | Lorenzo J.A.,University of Santiago de Compostela | Heras D.B.,University of Santiago de Compostela | Cabaleiro J.C.,University of Santiago de Compostela | Pena T.F.,University of Santiago de Compostela
Journal of Supercomputing | Year: 2011

In this paper, the sparse matrix-vector product (SpMV) is evaluated on the FinisTerrae SMP-NUMA supercomputer. Its architecture particularities make the tuning of SpMV especially relevant due to the significant impact on the performance. First, we have estimated the influence of data and thread allocation. Moreover, because of the indirect and irregular memory access patterns of SpMV, we have also studied the influence of the memory hierarchy in the performance. According to the behavior observed in the study, a set of optimizations specially tuned for FinisTerrae were successfully applied to SpMV. Noticeable improvements are obtained in comparison with the SpMV naïve implementation. © 2010 Springer Science+Business Media, LLC.

Pichel J.C.,Galicia Supercomputing Center | Heras D.B.,University of Santiago de Compostela | Cabaleiro J.C.,University of Santiago de Compostela | GarciaLoureiro A.J.,University of Santiago de Compostela | Rivera F.F.,University of Santiago de Compostela
International Journal of High Performance Computing Applications | Year: 2010

Irregular codes are present in many scientific applications, such as finite element simulations. In these simulations the solution of large sparse linear equation systems is required, which are often solved using iterative methods. The main kernel of the iterative methods is the sparse matrix-vector multiplication which frequently demands irregular data accesses. Therefore, techniques that increase the performance of this operation will have a great impact on the global performance of the iterative method and, as a consequence, on the simulations. In this paper a technique for improving the locality of sparse matrix codes is presented. The technique consists of reorganizing the data guided by a locality model instead of restructuring the code or changing the sparse matrix storage format. We have applied our proposal to different iterative methods provided by two standard numerical libraries. Results show an impact on the overall performance of the considered iterative method due to the increase in the locality of the sparse matrix-vector product. Noticeable reductions in the execution time have been achieved both in sequential and in parallel executions. This positive behavior allows the reordering technique to be successfully applied to real problems. We have focused on the simulation of semiconductor devices and in particular on the BIPS3D simulator. The technique was integrated into the simulator. Both sequential and parallel executions have been analyzed extensively in this paper. Noticeable reductions in the execution time required by the simulations are observed when using our reordered matrices in comparison with the original simulator. © 2010 The Author(s).

Gonzalez-Dominguez J.,University of La Coruña | Martin M.J.,University of La Coruña | Taboada G.L.,University of La Coruña | Tourino J.,University of La Coruña | And 3 more authors.
Concurrency Computation Practice and Experience | Year: 2012

The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message-passing-based parallel numerical library, demonstrating good scalability and efficiency. Copyright © 2012 John Wiley & Sons, Ltd.

Gomez-Folgar F.,Galicia Supercomputing Center | Cacheiro J.L.,Galicia Supercomputing Center | Sanchez C.F.,Galicia Supercomputing Center | Garcia-Loureiro A.,University of Santiago de Compostela | Valin R.,University of Santiago de Compostela
Proceedings of the 8th Spanish Conference on Electron Devices, CDE'2011 | Year: 2011

We propose the development of a new e-Science infrastructure that would take the best of both grid and cloud technologies, and it would allow different research groups that perform nanoelectronic simulations to share their local clusters and create a common infrastructure accessible through a unified point of access. Therefore, more computational power can be used to perform nanoelectronic simulations, with the consequent reduction of time required to obtain the results. The integration of local clusters to share resources, through the proposed cloud management stack, will allow deploying an elastic infrastructure that will also permit to prioritize local computing tasks over shared ones. Furthermore, it will allow not only the deployment of ad-hoc virtual machines across local sites to achieve specific tasks but also to deploy virtual machines in public clouds like Amazon AWS to get additional computing resources, and even avoiding data losing by using public storage clouds like Amazon S3. © 2011 IEEE.

Pichel J.C.,University of Santiago de Compostela | Rivera F.F.,University of Santiago de Compostela | Fernandez M.,Galicia Supercomputing Center | Rodriguez A.,Galicia Supercomputing Center
Microprocessors and Microsystems | Year: 2012

It is well-known that reordering techniques applied to sparse matrices are common strategies to improve the performance of sparse matrix operations, and particularly, the sparse matrix vector multiplication (SpMV) on CPUs. In this paper, we have evaluated some of the most successful reordering techniques on two different GPUs. In addition, in our study a number of sparse matrix storage formats were considered. Executions for both single and double precision arithmetics were also performed. We have found that SpMV is very sensitive to the application of reordering techniques on GPUs. In particular, several characteristics of the reordered matrices that have a big impact on the SpMV performance have been detected. In most of the cases, reordered matrices outperform the original ones, showing noticeable speedups up to 2.6×. We have also observed that there is no one storage format preferred over the others. © 2011 Elsevier B.V. All rights reserved.

Loading Galicia Supercomputing Center collaborators
Loading Galicia Supercomputing Center collaborators