Large Scale Simulation Research Laboratory

Pathum Thani, Thailand

Large Scale Simulation Research Laboratory

Pathum Thani, Thailand
Time filter
Source Type

Kijsipongse E.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory
2014 11th Int. Joint Conf. on Computer Science and Software Engineering: "Human Factors in Computer Science and Software Engineering" - e-Science and High Performance Computing: eHPC, JCSSE 2014 | Year: 2014

MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted. © 2014 IEEE.

Sirisup S.,Large Scale Simulation Research Laboratory | Maleewong M.,Kasetsart University
Modelling and Simulation in Engineering | Year: 2012

The projective integration method based on the Galerkin-free framework with the assistance of proper orthogonal decomposition (POD) is presented in this paper. The present method is applied to simulate two-dimensional incompressible fluid flows past the NACA0012 airfoil problem. The approach consists of using high-accuracy direct numerical simulations over short time intervals, from which POD modes are extracted for approximating the dynamics of the primary variables. The solution is then projected with larger time steps using any standard time integrator, without the need to recompute it from the governing equations. This is called the online projective integration method. The results by the projective integration method are in good agreement with the full scale simulation with less computational needs. We also study the individual function of each POD mode used in the projective integration method. It is found that the first POD mode can capture basic flow behaviors but the overall dynamic is rather inaccurate. The second and the third POD modes assist the first mode by correcting magnitudes and phases of vorticity fields. However, adding the fifth POD mode in the model leads to some incorrect results in phase-shift forms for both drag and lift coefficients. This suggests the optimal number of POD modes to use in the projective integration method. © 2012 Sirod Sirisup and Montri Maleewong.

Prueksaaroon S.,King Mongkut's University of Technology Bangkok | Prueksaaroon S.,Large Scale Simulation Research Laboratory | Varavithya V.,King Mongkut's University of Technology Bangkok
International Journal of Advancements in Computing Technology | Year: 2012

Computer clusters have become a main stream of high performance computing platforms. To harvest these high performance systems, cluster operating environment has pressed on an additional abstraction using virtualization technology. Specific problem solving environments are isolated at the operating system level where real executions are performed in virtualization domains. Virtualization technology helps not only increasing utilization of computing resources but also reducing configuration workload, administrative cost, application porting, and energy saving. In this work, we investigated an implementation of virtualization cluster. The experimental performance results of the virtualization cluster are presented. We describe a framework to adopt virtualization onto the clusters. The management of virtualization cluster is discussed. The definitions for the operations of virtual clusters are given. Based on these definitions, we proposed the virtual cluster scheduler where the newly introduced provisioning factors and the goodness factor extend flexibility in management of a cluster. The virtual cluster scheduler can be used as a basis of implementing virtual cluster management. The software layers for virtual cluster management are discussed and the job submission workflow is explained. The framework for organizing cluster resources in virtualization environment is described.

Sirisup S.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory
ECTI-CON 2010 - The 2010 ECTI International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology | Year: 2010

The Global Arrays toolkit is a library that allows programmers to write parallel programs that use large arrays distributed across processing nodes through the Aggregate Remote Memory Copy Interface (ARMCI). OpenMP is an application programming interface that supports shared memory multiprocessing on many architectures and platforms. In the Symmetric-Multi Processors (SMP), the Global Arrays toolkit will expose the programmers quite similar to that provided by OpenMP. In this study, we will further our investigation on the performance of a parallel application implemented with the Global Arrays toolkit and OpenMP on Grid computing environment. The investigation focuses on the case that an SMP cluster is included in the Grid computing environment. The multi-level parallelism together with multi-level topology-aware techniques have been used in both implementations. We have found that performance of the evaluating application implemented with Global Arrays technique is comparable to that of the application implemented with OpenMP. This implies that programmer can directly port the Global Arrays application directly to the SMP cluster yet its performance is not dropped compared to the native implementation.

Kijsipongse E.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory
2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2012 | Year: 2012

In the recent years, high performance computing (HPC) resources has grown up rapidly and diversely. The next generation of HPC platforms is assembled from resources of various types such as multi-core CPUs and GPUs. Thus, the development of a parallel program to fully utilize heterogeneously distributed resources in HPC environment is a challenge. A parallel program should be portable and able to run efficiently on all types of computing resources with the least effort. We combine the advantages of Global Arrays and OpenCL for such the parallel programs. We employ the OpenCL in implementing parallel applications at fine-grain level so that they can execute across heterogeneous platforms. At coarse grain level, we utilize the Global Arrays for efficient data communication between computing resources in terms of virtually shared memory. In addition, we also propose a load balancing technique based on the task pool model for hybrid OpenCL/Global Arrays applications on heterogeneous platforms to improve the performance of the applications. © 2012 IEEE.

Kijsipongse E.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory
JCSSE 2012 - 9th International Joint Conference on Computer Science and Software Engineering | Year: 2012

K-Means is the clustering algorithm which is widely used in many areas such as information retrieval, computer vision and pattern recognition. With the recent advance in General Purpose Graphics Processing Unit (GPGPU), we can use a modern GPU which is capable to do computation up to Tflops to calculate K-Means clustering on average problems. However, due to the exponential growth of data, the K-Means clustering on a single GPU will not be adequate for large datasets in the near future. In this paper, we present the design and implementation of an efficient large-scale parallel K-Means on GPU clusters. We utilize the massive parallelism in GPUs to speed up the most time consuming part of K-Means clustering in each node. We employ the dynamic load balancing to distribute workload equally on different GPUs installed in the clusters so as to improve the performance of the parallel K-Means at the inter-node level. We also take advantage from software distributed shared memory to simplify the communication and collaboration among nodes. The result of the evaluation shows the performance improvement of the parallel K-Means by maintaining load balance on GPU clusters. © 2012 IEEE.

Pakornchote T.,Chulalongkorn University | Bovornratanaraks T.,Chulalongkorn University | Vannarat S.,Large Scale Simulation Research Laboratory | Pinsook U.,Chulalongkorn University
Solid State Communications | Year: 2016

We investigate the wave-like arrangements of H atoms around metal plane (Hm) in the ScH3 hcp phase by using the ab-initio method. We found that only P63/mmc, P3¯c1, P63cm and P63 phases are energetically favorable. The wave-like arrangement allows the off-site symmetry positions of the H atoms, and leads to substantial changes in the pair distribution between Sc and H atoms which are associating with the changes in the electronic structure in such a way that the total energy is lowering. The symmetry breaking from P63mmc is also responsible for the band gap opening. In the P63 structure, the calculated band gap is 0.823 eV and 1.223 eV using GGA and sX-LDA functionals, respectively. This band gap can be compared with 1.7 eV derived from the optical measurement and 1.55 eV from the HSE06 calculation. Thus, the broken symmetry structures can be viewed as Peierls distortion of the P63/mmc structure. Furthermore, we found that only the P63 structure is dynamically stable, unlike YH3 where the P63cm structure is also stable. The stability of P63 comes from sufficiently strong interactions between two neighboring H atoms at their off-site symmetry positions, i.e. near the metal plane and near the tetragonal site. The P63 phonon density of states is in good agreement with the data from the neutron experiment. © 2015 Elsevier Ltd.

Kijsipongse E.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory
Proceedings of the 2013 10th International Joint Conference on Computer Science and Software Engineering, JCSSE 2013 | Year: 2013

Due to the computational demand of data intensive applications, parallel computer hardware such as the HPC Cluster system is required to execute such the applications. However, building large HPC Clusters for this sole purpose is not always feasible or even not cost-effective since the purchasing, operational and maintenance cost of the dedicated systems is too high but they are not fully utilized in most of the time. In this regard, Volunteer Computing can address this problem as it provides a large amount of computing resources at no cost. We develop a system that expands the computing capability of HPC Clusters by using the additional computing power donated by volunteer users who would like to give the computing resources of their unused desktop computers to help execute jobs in the HPC Clusters. The proposed system can combine the native Cluster compute nodes and a set of non-dedicated compute nodes contributed by volunteers. The experiments demonstrate that the volunteer resources, such as CPU time, disk storage and GPU, can be seamlessly integrated into the HPC Clusters allowing the systems to dynamically scale up/down regarding to the amount of resources in Volunteer Computing. © 2013 IEEE.

Kijsipongse E.,Large Scale Simulation Research Laboratory | U-Ruekolan S.,Large Scale Simulation Research Laboratory | Ngamphiw C.,Genome Institute | Tongsima S.,Genome Institute
Proceedings of the 2011 8th International Joint Conference on Computer Science and Software Engineering, JCSSE 2011 | Year: 2011

The calculation of pairwise correlation coefficient on a dataset, known as the correlation matrix, is often used in data analysis, signal processing, pattern recognition, image processing, and bioinformatics. With the state-of-the-art Graphic Processing Units (GPUs) that consist of massive cores capable to do processing up to several Gflops, the calculation of correlation matrix can be accelerated several times over traditional CPUs. However, due to the rapid growth of the data in the digital era, the correlation matrix calculation becomes computing intensive which needs to be executed on multiple GPUs. As of now, GPUs are common components in data center at many institutions. Their GPU deployment tends towards a GPU cluster which each node is equipped with GPUs. In this paper, we propose a parallel computing based on the hybrid MPI/CUDA programming for fast and efficient Pearson correlation matrix calculation on GPU clusters. At coarse grain parallelism, the correlation matrix is partitioned into tiles which are distributed to execute concurrently on many GPUs using MPI. At fine grain level, the CUDA kernel function on each node performs massively parallel computing on a GPU. To balance load across all GPUs, we adopt the work pool model which there is a master node that manages tasks in the work pool and dynamically assign tasks to worker nodes. The result of the evaluation shows that the proposed work can ensure the load balance across different GPUs and thus gives better execution time than using a simple static data partitioning. © 2011 IEEE.

Sirisup S.,Large Scale Simulation Research Laboratory | Tomkratoke S.,Large Scale Simulation Research Laboratory | Lertapisit W.,Large Scale Simulation Research Laboratory
OCEANS 2016 - Shanghai | Year: 2016

Ocean currents are one of the most important factors in marine environment. Ocean currents affect the availability of nutrients, food and the spread of eggs and larvae for marine animals. Besides, the ocean currents are also the main driven forces for sediment transport and coastal processes which are the key factors for the coastal erosion and accretion as well as marine pollution. Thus, analyzing ocean current information, especially in shallow seas, will benefit ocean scientists in deriving new knowledge on ocean processes and variability, fishery/aquaculture scientists in incorporation the larval distribution of fish into improved management plans and management authorities in deriving and implementing more effective coastal management schemes. In this work, we first aim at simulating the ocean circulation in the Gulf of Thailand. The region is composed of complex coastlines and seafloor topography. This can result in complicated ocean currents. To this end, we employ the unstructured grid Finite-Volume Coastal Ocean Model (FVCOM) to handle such issue with its geometric flexibility capability. For the model validation purpose, we use the surface currents data measured by high frequency surface wave radar (HFSWR) from Geo-Informatics and Space Technology Development Agency (GISTDA). We have found that the RMS error shows the model output agrees with the observation well. Besides, we also investigate the overall characteristic of the simulate currents together with the observed currents as well. This is done through the Empirical Orthogonal Function (EOF). The results also indicate that the dominant characteristic of ocean currents from simulation is analogous to the observation. To sum up, these results justify that the model can be predicted and explained the behavior of ocean currents precisely and realistically. Secondly, we also analyze the EOF pattern of the ocean circulation in order to provide insight into the dynamics and spatial structure of the ocean currents in the Gulf of Thailand. © 2016 IEEE.

Loading Large Scale Simulation Research Laboratory collaborators
Loading Large Scale Simulation Research Laboratory collaborators