Entity

Time filter

Source Type


Sanjuan G.,Autonomous University of Barcelona | Tena C.,Barcelona Supercomputer Center | Margalef T.,Autonomous University of Barcelona | Cortes A.,Autonomous University of Barcelona
Journal of Supercomputing | Year: 2016

Wind field calculation is a critical issue in reaching accurate forest fire propagation predictions. However, when the involved terrain map is large, the amount of memory and the execution time can prevent them from being useful in an operational environment. Wind field calculation involves sparse matrices that are usually stored in CSR storage format. This storage format can cause sparse matrix-vector multiplications to create a bottleneck due to the number of cache misses involved. Moreover, the matrices involved are extremely sparse and follow a very well-defined pattern. Therefore, a new storage system has been designed to reduce memory requirements and cache misses in this particular sparse matrix-vector multiplication. Sparse matrix-vector multiplication has been implemented using this new storage format and taking advantage of the inherent parallelism of the operation. The new method has been implemented in OpenMP, MPI and CUDA and has been tested on different hardware configurations. The results are very promising and the execution time and memory requirements are significantly reduced. © 2016 The Author(s) Source


Alvanos M.,Barcelona Supercomputer Center | Alvanos M.,Polytechnic University of Catalonia | Alvanos M.,IBM | Farreras M.,Polytechnic University of Catalonia | And 4 more authors.
Proceedings of the International Conference on Supercomputing | Year: 2013

The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity in large scale parallel machines. However, PGAS programs may have many fine-grained shared accesses that lead to performance degradation. Manual code transformations or compiler optimizations are required to improve the performance of programs with fine-grained accesses. The downside of manual code transformations is the increased program complexity that hinders programmer productivity. On the other hand, most compiler optimizations of fine-grain accesses require knowledge of physical data mapping and the use of parallel loop constructs. This paper presents an optimization for the Unified Parallel C language that combines compile time (static) and runtime (dynamic) coalescing of shared data, without the knowledge of physical data mapping. Larger messages increase the network efficiency and static coalescing decreases the overhead of library calls. The performance evaluation uses two microbenchmarks and three benchmarks to obtain scaling and absolute performance numbers on up to 32768 cores of a Power 775 machine. Our results show that the compiler transformation results in speedups from 1.15X up to 21X compared with the baseline versions and that they achieve up to 63% the performance of the MPI versions. © 2013 ACM. Source


Blanco R.,Yahoo! | Bortnikov E.,Yahoo! | Junqueira F.,Yahoo! | Lempel R.,Yahoo! | And 2 more authors.
Proceedings of the 19th International Conference on World Wide Web, WWW '10 | Year: 2010

A Web search engine must update its index periodically to incorporate changes to the Web, and we argue in this work that index updates fundamentally impact the design of search engine result caches. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. To enable efficient invalidation of cached results, we propose a framework for developing invalidation predictors and some concrete predictors. Evaluation using Wikipedia documents and a query log from Yahoo shows that selective invalidation of cached search results can lower the number of query re-evaluations by as much as 30% compared to a baseline time-to-live scheme, while returning results of similar freshness. © 2010 Copyright is held by the author/owner(s). Source


Eriksson J.,Uppsala University | Nocente M.,University of Milan Bicocca | Nocente M.,CNR Institute for Plasma Physics Piero Caldirola | Binda F.,Uppsala University | And 23 more authors.
Nuclear Fusion | Year: 2015

Observations made in a JET experiment aimed at accelerating deuterons to the MeV range by third harmonic radio-frequency (RF) heating coupled into a deuterium beam are reported. Measurements are based on a set of advanced neutron and gamma-ray spectrometers that, for the first time, observe the plasma simultaneously along vertical and oblique lines of sight. Parameters of the fast ion energy distribution, such as the high energy cut-off of the deuteron distribution function and the RF coupling constant, are determined from data within a uniform analysis framework for neutron and gamma-ray spectroscopy based on a one-dimensional model and by a consistency check among the individual measurement techniques. A systematic difference is seen between the two lines of sight and is interpreted to originate from the sensitivity of the oblique detectors to the pitch-angle structure of the distribution around the resonance, which is not correctly portrayed within the adopted one dimensional model. A framework to calculate neutron and gamma-ray emission from a spatially resolved, two-dimensional deuteron distribution specified by energy/pitch is thus developed and used for a first comparison with predictions from ab initio models of RF heating at multiple harmonics. The results presented in this paper are of relevance for the development of advanced diagnostic techniques for MeV range ions in high performance fusion plasmas, with applications to the experimental validation of RF heating codes and, more generally, to studies of the energy distribution of ions in the MeV range in high performance deuterium and deuterium-tritium plasmas. © 2015 EURATOM. Source


Goel B.,Chalmers University of Technology | McKee S.A.,Chalmers University of Technology | Gioiosa R.,Barcelona Supercomputer Center | Singh K.,Cornell University | And 2 more authors.
2010 International Conference on Green Computing, Green Comp 2010 | Year: 2010

Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and temperature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging. Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application- independent models for four different (four- to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1 %-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall). ©2010 IEEE. Source

Discover hidden collaborations