Berekmeri M.,CNRS GIPSA Laboratory |
Berekmeri M.,Distributed Computer Systems Group |
Berekmeri M.,French National Center for Scientific Research |
Berekmeri M.,University Grenoble alpes |
And 12 more authors.
IFAC Proceedings Volumes (IFAC-PapersOnline) | Year: 2014
We are at the dawn of a huge data explosion therefore companies have fast growing amounts of data to process. For this purpose Google developed MapReduce, a parallel programming paradigm which is slowly becoming the de facto tool for Big Data analytics. Although to some extent its use is already wide-spread in the industry, ensuring performance constraints for such a complex system poses great challenges and its management requires a high level of expertise. This paper answers these challenges by providing the first autonomous controller that ensures service time constraints of a concurrent MapReduce workload. We develop the first dynamic model of a MapReduce cluster. Furthermore, PI feedback control is developed and implemented to ensure service time constraints. A feedforward controller is added to improve control response in the presence of disturbances, namely changes in the number of clients. The approach is validated online on a real 40 node MapReduce cluster, running a data intensive Business Intelligence workload. Our experiments demonstrate that the designed control is successful in assuring service time constraints. © IFAC.