Entity

Time filter

Source Type


Wittmann M.,Erlangen Regional Computing Center Rrze Martensstr 1 91058 Erlangen Germany | Hager G.,Erlangen Regional Computing Center Rrze Martensstr 1 91058 Erlangen Germany | Zeiser T.,Erlangen Regional Computing Center Rrze Martensstr 1 91058 Erlangen Germany | Treibig J.,Erlangen Regional Computing Center Rrze Martensstr 1 91058 Erlangen Germany | Wellein G.,Erlangen Regional Computing Center Rrze Martensstr 1 91058 Erlangen Germany
Concurrency Computation | Year: 2015

Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice Boltzmann method on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip performance and power characteristics can be generalized to the highly parallel case. First, we perform an analysis of a sparse-lattice lattice Boltzmann method implementation for complex geometries. Using a single-core performance model, we predict the intra-chip saturation characteristics and the optimal operating point in terms of energy-to-solution as a function of implementation details, clock frequency, vectorization, and number of active cores per chip. We show that high single-core performance and a correct choice of the number of active cores per chip are the essential optimizations for the lowest energy-to-solution at minimal performance degradation. Then we extrapolate to the Message Passing Interface (MPI)-parallel level and quantify the energy-saving potential of various optimizations and execution modes, where we find these guidelines to be even more important, especially when communication overhead is non-negligible. In our setup, we could achieve energy savings of 35% in this case, compared with a naive approach. We also demonstrate that a simple non-reflective reduction of the clock speed leaves most of the energy-saving potential unused. © 2015 John Wiley & Sons, Ltd. Source

Discover hidden collaborations