Jakel R.,Center for Information Services and High Performance Compunting |
Muller-Pfefferkorn R.,Center for Information Services and High Performance Compunting |
Kluge M.,Center for Information Services and High Performance Compunting |
Grunzke R.,Center for Information Services and High Performance Compunting |
Nagel W.E.,Center for Information Services and High Performance Compunting
Advances in Parallel Computing | Year: 2015
The sheer volume of data accumulated in many scientific disciplines as well as in industry is a critical point that requires immediate attention. The handling of large data sets will become a limiting factor-even for data intensive applications running on future Exascale systems. Nowadays, Big Data can be more a collection of challenges for data processing at large scale and less a tool box of solutions used to improve applications, scale well, and handle the constantly growing data sets. There is an urgent need for intelligent mechanisms to acquire, process, and analyze data, which have to run and scale efficiently on current and future computing architectures. The complexity of Big Data applications will highly profit from flexible workflow systems that consider the full data life cycle, from data acquisition to long-term storage and towards the curation of knowledge. To maximize the applicability of HPC systems for Big Data workflows, several changes in the system architecture and its software need to be considered. First, in order to exploit all available I/O capacities an adaptable monitoring system needs to collect information about I/O patterns of application and workflows as well as provide information to model the I/O subsystem. The goal is to collect long term performance data, to evaluate this data, and finally to show how and why resources cannot be used to their full potential. Second, as the complexity of systems is continuously increasing, the level of abstraction that is presented to the user needs to increase with at least the same rate in order to ensure that the current usability is at least maintained. This is accomplished by employing science gateways as well as workflow and metadata technologies. © 2015 The authors and IOS Press. All rights reserved.