News Article | April 5, 2016
This is the fourth installment in the series covering how researchers from national laboratories and scientific research centers are updating popular molecular dynamics, quantum chemistry and quantum materials code to take advantage of hardware advances, such as the next-generation Intel Xeon Phi processors. Molecular dynamics software, used to simulate the evolution of a system of atoms or other particles, enables researchers to determine the thermodynamic, kinetic and transport properties of various materials. This research tool is used for a wide variety of applications that include developing new and better therapeutics, designing new materials with better properties and improving the efficiency of molecular devices. Molecular dynamic research requires the use of high performance computing systems and specialized HPC software to perform the highly compute-intensive mathematical calculations and to simulate images of the molecular structures. The DL_POLY molecular simulation software package is a widely used classical molecular dynamics (MD) simulation application developed at the Science & Technology Facilities Council (STFC) Daresbury Laboratory. The package is used to model the atomistic evolution of the full spectrum of models commonly employed in the materials science, solid-state chemistry, biological simulation and soft condensed-matter communities. DL_POLY 4 can be executed as a serial or parallel application. The code achieves parallelization that is suitable for homogeneous, distributed-memory, parallel computers. The Irish Centre for High-End Computing (ICHEC) is the national HPC Centre for Ireland, with offices located both in Dublin and Galway, and is hosted by the National University of Ireland, Galway. ICHEC operates the national HPC service providing compute resources and software expertise for the research communities across all the main science disciplines through collaborative partnerships and programs of education and outreach. ICHEC is recognized as an Intel Parallel Computing Center (PCC) and focuses on optimizing DL_POLY and other software codes to be able to take advantage of hardware features in the Intel Xeon processor and Intel Xeon Phi processor-based systems. According to Dr. Michael Lysaght, Head of Novel Technologies, ICHEC, “DL_POLY is used to study a wide range of molecular dynamics from solids to liquids to biomolecules, as shown in Figures 1 and 2. For our code modernization work on DL_POLY 4, we have focused on two use cases of interest to the material science community, namely an iron simulation at 300 K using NPT Berendsen ensemble with Finnis-Sinclair forces and no electrostatics (250,000 atoms) and 16 Gramidicin A molecule in aqueous solution at 300 K using NPT Berendsen ensemble with SPME and SHAKE/RATTLE algorithm for the constrained motion (792,960 atoms). Modifying legacy code for modern hardware Many of the legacy codes used in molecular dynamics research have been in existence for a long time, and the code is not designed to take advantage of features of new hardware, such as the Intel Xeon Phi coprocessor. As a national HPC center focused on supporting a wide range of communities and a huge number of applications, the ICHEC team finds that one of the main benefits of working with Intel Xeon processors and Intel Xeon Phi coprocessors is that they use common languages, models, and familiar and standard development tools, so there’s no need to learn new languages or tools, and performance-focused improvements are fully portable. “ICHEC believes strongly in the huge impact that the Intel Xeon Phi processor can have on accelerating scientific discovery, particularly in the area of chemistry. Recognizing that the massive performance of the Intel Xeon Phi processor can only be unleashed through significant code modernization, we have focused heavily on preparing several codes across a wide range of domains for current and future generations of the platform. Much of our research and development work is driven by both national and international collaborative projects, including the European PRACE-RI, EU H2020 projects and ICHEC's Industry Services Program. Working closely with Intel throughout, one of ICHEC's aims is to continue to help our users, partners and clients to remain competitive through the use of current and future generations of Intel Xeon Phi processors,” states Lysaght. The progress the ICHEC team makes in optimizing DL_POLY code will be rolled back into the DL_POLY 4 version of the code so researchers can use the changes without needing to manually change their legacy code. The early-stage efforts at ICHEC have been to enable and optimize DL_POLY 4 to run efficiently on Intel Xeon Phi processor-based systems. Efforts have focused on designing and implementing an efficient hybrid MPI/OpenMP version of the DL_POLY 4 code for the first time. DL_POLY is a highly scalable pure MPI-based code with over 500,000 lines of code. The ICHEC team has evaluated this hybrid version of the code in native, offload and MPI symmetric modes, with a particular focus on the top three most time-consuming algorithms of the code for several real-world test cases that are of interest to the materials science community. These components include the calculation of 'two body forces (TBF)', 'link cell pairs' and 'constraints', all of which are found in other well-known molecular dynamics software packages. “Because these are general methods in this domain, we feel the methods we have applied as part of our focus on DL_POLY 4 will also have relevance to other codes in the community,” indicates Lysaght. The Two Body Forces (TBF) module is one of the most compute-intensive components of DL_POLY 4 (typically taking up more than 30 percent of wall clock time, but this can vary for different problem types). Optimizing the TBF module is where the ICHEC team focused most of their effort. The team implemented an OpenMP parallelization of the TBF stage targeting both Intel Xeon processors and Intel Xeon Phi coprocessors in one code base. The initial OpenMP parallelization scheme exploited OpenMP reduction clauses, which were found to significantly impede scaling over OpenMP threads on the Intel Xeon Phi coprocessor. The team implemented an alternative design of the TBF component, which avoids OpenMP reductions over arrays and which improves performance by a factor of over four times on the Intel Xeon Phi coprocessor relative to the out-of-the-box OpenMP implementation using the OpenMP reduction clause. Modifications that ICHEC is making to TBF stem mainly from employing an alternative ‘direct’ means of calculating the two body forces, making much better use of the vector processing units (VPUs) on the Intel Xeon Phi coprocessor, such that the throughput of two body force calculations (TBFs/s) has improved significantly. The ICHEC TBF work focused on testing the Two Body Forces time-to-solution on an Intel Xeon processor (one and two sockets) and Intel Xeon Phi coprocessors 5110P and 7120P. In Figure 3, the light shading indicates that a tabulated method was used for the two body forces simulations, while the light green shading indicates a direct evaluation of the molecular potential. For the Intel Xeon processor system, ICHEC used 10 MPI processes for one socket and 20 MPI processes for the two-socket case. In the case of the Intel Xeon Phi coprocessor test, the best performance is achieved by using 30 MPI processes each spawning enough threads to fully subscribe two cores each. In the case of the Intel Xeon Phi coprocessor, it can be seen how performance improves by using extra threads on each core, with four threads per core showing the best result on the Intel Xeon Phi coprocessor (shown by the orange bars). As can be seen in Figure 3, on the Intel Xeon Phi coprocessor 5110P, there is a greater than 23 percent (12 percent on the Intel Xeon processor) improvement in the TBF time-to-solution when ICHEC employed the direct means of calculating the two-body forces versus employing the tabulated means of calculating the forces, which is the approach most typically used in DL_POLY 4. As an overall result of the ICHEC work focused on TBF, the throughput of the two body forces calculations on a single Intel Xeon Phi coprocessor 7120P (in offload mode) is now approaching the TBF/s throughput achieved when running on two Intel Xeon processors E5-2660 v2. The work that ICHEC has done to optimize DL_POLY code is available to other researchers to help save time in manually changing code. In collaboration with STFC, who are the main developers of DL_POLY, a version of the DL_POLY code that contains the ICHEC changes can be accessed by downloading a branch version of the code that is available on CCPForge. However, there are areas where ICHEC believes that improvements need to be made to future HPC hardware and software to enable future exascale research. The ICHEC team sees issues across various projects that are difficult to address with the current HPC systems and codes. For example, many of the chemistry codes reveal workloads that are typically memory bound, as well as comprised of kernels that are often challenging to vectorize efficiently due to heavy branching and indirect array accesses. In addition, communication bound methods, such as distributed fast Fourier transforms (FFTs) will continue to place demands on internode communication at extreme scale. “With these challenges in mind, we are happy to see the standard CPU form factor of the next-generation Intel Xeon Phi processor, the high bandwidth on package memory, as well as the support for Intel Omni-Path connectors, that will be part of the second-generation Intel Xeon Phi processor platforms. Continual improvements to the auto-vectorization capability of the Intel compiler will also be welcome with the improved analytical capability of Intel profilers, such as Intel VTune and the Intel Vector Advisor Tool, to aid with deeper performance insights.” “In terms of national research priority areas, we are specifically interested in preparing DL_POLY 4 for the deep petascale/future exascale era and as a state-of-the-art tool for advanced materials researchers. Ireland has a particular interest in this area due to the presence of leading materials science centers, such as the Tyndall National Institute, AMBER, as well as being the headquarters for the E-CAM EU Horizon 2020 HPC Applications Centre of Excellence (of which ICHEC is a member). In close support of researchers in Ireland, our ICHEC team has focused on enabling DL_POLY 4 on Intel Xeon Phi processor-based systems to provide insight in the area of sustainable energy production, which is a grand challenge for society and an integral part of Ireland’s sustainable energy objectives and climate change strategy and where such energy production will require novel materials engineered with atomistic precision,” states Lysaght. Other articles in this series covering the modernization of popular chemistry codes include: Linda Barney is the founder and owner of Barney and Associates, a technical/marketing writing, training and web design firm in Beaverton, OR. R&D 100 AWARD ENTRIES NOW OPEN: Establish your company as a technology leader! For more than 50 years, the R&D 100 Awards have showcased new products of technological significance. You can join this exclusive community! .
News Article | August 15, 2016
Quantum computing remains mysterious and elusive to many, but USC Viterbi School of Engineering researchers might have taken us one step closer to bring the superpowered devices to practical reality. The Information Sciences Institute at USC Viterbi is home to the USC-Lockheed Martin Quantum Computing Center (QCC), a supercooled, magnetically shielded facility specially built to house the first commercially available quantum optimization processors – devices so advanced that there are currently only two in use outside the Canadian company D-Wave Systems, where they were built: The first one went to USC and Lockheed Martin, the second to NASA and Google. Quantum computers encode data in quantum bits, or “qubits,” which have the capability of representing the two digits of one and zero at the same time – as opposed to traditional bits, which can encode distinctly either a one or a zero. This property, called superposition, along with the ability of quantum states to “interfere” (cancel or reinforce each other like waves in a pond) and “tunnel” through energy barriers, is what may one day allow quantum processors to ultimately perform optimization calculations much faster than is possible using traditional processors. Optimization problems can take many forms, and quantum processors have been theorized to be useful for a variety of machine learning and big data problems like stock portfolio optimization, image recognition and classification, and detecting anomalies. Yet, because of the exotic way in which quantum computers process information, they are highly sensitive to errors of different kinds. When such errors occur they can erase any quantum computational advantage — so developing methods to overcome errors is of paramount importance in the quest to demonstrate “quantum supremacy.” USC researchers Walter Vinci, Tameem Albash and Daniel Lidar put forth a scheme to minimize errors. Their solution, explained in the article “Nested Quantum Annealing Correction” published in the journal Nature Quantum Information, is focused on reducing and correcting errors associated with heating, a type of errors that is common and particularly detrimental in quantum optimizers. Cooling the quantum processor further is not possible since the specialized dilution refrigerator that keeps it cool already operates at its limit, at a temperature approximately 1,000 times colder than outer space. Vinci, Albash and Lidar have developed a new method to suppress heating errors: By coupling several qubits together on a D-Wave Two quantum optimizer, without changing the hardware of the device, these qubits act effectively as one qubit that experiences a lower temperature. The more qubits are coupled, the lower is the temperature experienced, allowing researchers to minimize the effect of heating as a source of noise or error. This nesting scheme is implementable not only on platforms such as the D-Wave processor on which it was tested, but also on other future quantum optimization devices with different hardware architectures. The researchers believe that this work is an important step in eliminating a bottleneck for scalable quantum optimization implementations. “Our work is part of a large scale effort by the research community aimed at realizing the potential of quantum information processing, which we all hope might one day surpass its classical counterparts,” said Lidar, a USC Viterbi professor and QCC scientific director.
News Article | January 13, 2016
Home > Press > Brookhaven Lab Expands Computational Science Initiative: Integrating data-intensive science expertise and investments across the Laboratory to tackle "big data" challenges Abstract: Building on its capabilities in data-intensive science, the U.S. Department of Energy's (DOE) Brookhaven National Laboratory has expanded its Computational Science Initiative (CSI, www.bnl.gov/compsci/). The programs within this initiative leverage computational science, computer science, and mathematics expertise and investments across multiple research areas at the Laboratory-including the flagship facilities that attract thousands of scientific users each year-further establishing Brookhaven as a leader in tackling the "big data" challenges at experimental facilities and expanding the frontiers of scientific discovery. Key partners in this endeavor include nearby universities such as Columbia, Cornell, New York University, Stony Brook, and Yale, and companies including IBM Research. In addition to support from the DOE Office of Science and Brookhaven Lab internal investments, the initiative will receive substantial funding from New York State over the next five years. This combined funding will enable the Initiative to pursue its aggressive growth strategy, both in terms of staffing and in extending its operational and research computing infrastructure. The initiative is led by Kerstin Kleese van Dam (Director, https://www.bnl.gov/compsci/people/staff.php?q=123), Michael Ernst (Deputy Director, www.bnl.gov/compsci/people/staff.php?q=109), and Robert Harrison (Chief Scientist, www.bnl.gov/compsci/people/staff.php?q=128). Advances in computational science, data management, and analysis have been a key factor in the success of Brookhaven Lab's scientific programs at the Relativistic Heavy Ion Collider (RHIC), the National Synchrotron Light Source (NSLS), the Center for Functional Nanomaterials (CFN)-all DOE Office of Science User Facilities-and in biological, atmospheric, and energy systems science. Computation also plays a major role in the Lab's collaborative participation in international research endeavors, such as the ATLAS experiment at Europe's Large Hadron Collider. "The Computational Science Initiative (CSI) brings together under one umbrella and extends the expertise that has driven this success," said Kleese van Dam. "Our mission is to foster cross-disciplinary collaborations to address the next generation of scientific data challenges posed by facilities such as NSLS's successor, the new National Synchrotron Light Source II (NSLS-II)." A particular focus of CSI's work will be the research, development and deployment of novel methods and algorithms for the timely analysis and interpretation of high volume, high velocity, heterogeneous scientific data created by experimental, observational, and computational facilities to accelerate and advance scientific discovery. "CSI is taking an integrated approach, engaging in leading-edge research, building the research and operational computing facility infrastructure required, and creating multi-disciplinary teams that deliver operational data analysis capabilities to the scientific user communities," said Kleese van Dam. Core to the initiative is the new Computer Science and Mathematics effort (www.bnl.gov/compsci/mathematics.php) led by Barbara Chapman, a recent joint appointee at Brookhaven Lab and Stony Brook University. Her team will focus on fundamental research into novel methods and algorithms in support of hypothesis-driven streaming data analysis in high-data-volume and high-data-velocity experimental and computing environments. Further efforts will research new solutions for multi-source streaming data analysis and interpretation, as well as long-term data curation and active reuse. "Reliability, high performance, and energy efficiency are key drivers for CSI's user communities, so the team's research will address all relevant aspects of streaming data processing from hardware architectures to the application layers," Chapman said. CSI's Computational Science Laboratory (CSL, www.bnl.gov/compsci/computational-lab.php) is a new collaborative institute for novel algorithm development and optimization. Bringing together expertise in high-performance computing (HPC), math, and domain science, it will specifically address the challenge of developing novel algorithms to deliver on the promise of exascale science (the ability to compute at a rate of 1018 floating point operations per second, or exaFLOPS). CSL will support the development of advanced simulation codes in classic domains such as materials science, chemistry, quantum chromodynamics, fusion, and large eddy simulations. In addition, CSL will provide training and advice to Brookhaven Lab science programs and facilities, enabling them to utilize emerging computing technologies to their full extent. CSL is led by Nicholas D' Imperio. A centerpiece of the initiative will be the new Center for Data-Driven Discovery (C3D, www.bnl.gov/compsci/C3D/), which will serve as external focal point for CSI's data-centric computing activities. Within the Laboratory, it will drive the integration of domain, computational, and data science expertise across Brookhaven's science programs and facilities, with the goal of accelerating and expanding scientific discovery by developing, deploying, and operating novel data-management, analysis, and interpretation tools and capabilities. A key focus area will be developing and deploying streaming data analysis services for experimental facilities. Outside the Laboratory, C3D will serve as a focal point for recruiting, collaboration, and communication. Kerstin Kleese van Dam is currently acting as its interim leader until a permanent lead for C3D is identified. The people and capabilities of C3D are integral to the success of Brookhaven's key DOE Office of Science User Facilities, including NSLS-II, RHIC, CFN, and a possible future electron ion collider. Hundreds of scientists from Brookhaven and thousands of facility users from universities, industry, and other laboratories across the country and around the world will benefit from the capabilities developed by C3D personnel to better understand the enormous volumes of data produced at these state-of-the-art research facilities. Underpinning the work of the CSI is the creation of a new, integrated scientific data, computing, and networking infrastructure across the Brookhaven Lab site-this new Scientific Data and Computing Center (www.bnl.gov/compsci/computing-center.php) will be led by Michael Ernst. Brookhaven Lab has a strong history of advances in the successful operation of large-scale computational science, data management, and analysis infrastructure, and the management of large-scale scientific data. One example of Brookhaven's computing expertise is the RHIC & ATLAS Computing Facility (RACF). Formed in 1997 to support experiments at RHIC, Brookhaven's flagship particle collider for nuclear physics research, the RACF is now at the center of a global computing network connecting more than 2,500 researchers around the world with data produced by RHIC and the ATLAS experiment at the Large Hadron Collider. This world-class center houses an ever-expanding farm of computing cores (50,000 as of 2015), receiving data from the thousands of particle collisions that take place each second at RHIC, along with petabytes of data generated by the LHC's ATLAS experiment-storing, processing, and distributing that data to and running analysis jobs for collaborators around the nation and the world. The success of this distributed approach to data-intensive computing, combined with new approaches for handling data-rich simulations, has helped establish the U.S. as a leader in high-capacity computing, thereby enhancing international competitiveness. RACF will serve as a model for computing and data investigations under the new initiative, and as such will form the core of the new Brookhaven Lab Scientific Data and Computing Center. The new center will also house the Lab's new institutional computing system, new NY State-funded operational data-intensive computing systems, a series of novel architecture research systems, as well as computing and data services operated for other third-party clients. The CSI-in conjunction with CSL and C3D-will also host a series of workshops/conferences and training sessions in high-performance and data-centric computing-including the New York Scientific Data Summit (NYSDS, www.bnl.gov/nysds/). These events will explore topics at the frontier of data-centric, high-performance computing, such as the combination of efficient methodologies and innovative computer systems and concepts to manage and analyze scientific data generated at high volumes and rates. Brookhaven National Laboratory is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov. About Brookhaven National Laboratory One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. Brookhaven is operated and managed for DOE's Office of Science by Brookhaven Science Associates, a limited-liability company founded by the Research Foundation for the State University of New York on behalf of Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit applied science and technology organization. For more information, please click If you have a comment, please us. Issuers of news releases, not 7th Wave, Inc. or Nanotechnology Now, are solely responsible for the accuracy of the content.
News Article | April 12, 2016
This is the fifth installment in a series covering how scientists are updating popular molecular dynamics, quantum chemistry and quantum materials code to take advantage of hardware advances, such as the forthcoming Intel Xeon Phi processors. Quantum-mechanical materials and molecular modeling research is the science for materials modeling at the nanoscale. Quantum materials research examines elementary particles using a mathematical interpretation of the structure and interactions of matter. This research has a wide range of applications, such as studying molecular systems for material assemblies, small chemical systems and studying biological molecules. High performance computing (HPC) systems are required for complex quantum materials research, due to the amount of data and the computation power required for calculating mathematical formulas and generating images. Researchers use specialized software such as Quantum ESPRESSO and a variety of HPC software in conducting quantum materials research. Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves and pseudo potentials. Quantum ESPRESSO is coordinated by the Quantum ESPRESSO Foundation and has a growing world-wide user community in academic and industrial research. Its intensive use of dense mathematical routines makes it an ideal candidate for many-core architectures, such as the Intel Xeon Phi coprocessor. The Intel Parallel Computing Centers at Cineca and Lawrence Berkeley National Lab (LBNL) along with the National Energy Research Scientific Computing Center (NERSC) are at the forefront in using HPC software and modifying Quantum ESPRESSO (QE) code to take advantage of Intel Xeon processors and Intel Xeon Phi coprocessors used in quantum materials research. In addition to Quantum ESPRESSO, the teams use tools such as Intel compilers, libraries, Intel VTune and OpenMP in their work. The goal is to incorporate the changes they make to Quantum ESPRESSO into the public version of the code so that scientists can gain from the modification they have made to improve code optimization and parallelization without requiring researchers to manually modify legacy code. One example of how Cineca used Quantum ESPRESSO to study a real device of promising scientific and technological interest is the electrical conductivity of a PDI-FCN2 molecule. This study was conducted by Cineca in collaboration with the University of Bologna and the National Research Council of Italy - Institute for Nanoscience (CNR-NANO). The object of this study is a two-terminal device based on a PDI-FCN2, a molecule derived from perylene. This system is important for the study of the electron transport in single-molecule devices and the further development of a new generation of organic field-effect transistor (OFET). The simulated system is composed of two gold electrodes, each of them made by 162 golden atoms. Between the electrodes, there is a PDI-FCN2 molecule. The system is made of 390 atoms and 3852 electrons. The metallic nature of the leads also requires a fine sampling of the Brillouin Zone in the description of the electronic structure. This further increases the computational effort required to simulate this system. Figures 1 and 2 show the molecular structure and the results of the study. The quantum mechanical solution of the electronic problem for such a huge system is a big challenge and it requires a large HPC computational infrastructure, like the one available at Cineca, and all the scaling properties of Quantum ESPRESSO. Dr. Carlo Cavazzoni (Principal Investigator at the Cineca Intel Parallel Computing Center) states, “Based on the results obtained by this study, we will gain a deep understanding in the intimate conduction mechanisms of this type of organic devices, going a step forward in the direction of utilizing the new OFET technologies that soon will replace the traditional silicon devices. Quantum ESPRESSO and new supercomputing facilities will make possible our studies and the understanding of the physics of the devices that, in the future, will be the building blocks for new photovoltaics cells, next-generation displays and molecular computers.” Cineca supercomputer The Cineca team currently does their research using the Cineca FERMI BGQ supercomputer and an IBM NeXtScale cluster named Galileo based on Intel Xeon processors and Intel Xeon Phi coprocessors (768 E7120P). Cineca’s next HPC computer will be a Lenovo named Marconi with 2 PFlops with Intel Xeon processors E5 v4 in a first stage and 11 PFlops with the next-generation Intel Xeon Phi processors before fall of 2016. A third-stage system will include 4.5PFlops with future Intel Xeon processors and integrate an Intel Omni-Path interconnection. Cineca is engaged in many R&D projects relating to HPC developments. One of the most important is MaX, which is a center of excellence funded by the European Community, whose ambitions are to establish an infrastructure for the material scientists and to support the development of codes toward the exascale. According to Cavazzoni, “We always focused our work on the part of numerical algorithms on the Fourier transform (FFT) and on the linear algebra modules. We started to rethink the overall organization of the memory hierarchy and parallelization structure. In particular, we modified the code in order to implement a sort of hierarchical tiling of data structures. In order to do so, we had to deeply modify the distribution of the data structure in Quantum ESPRESSO. The following figure shows the high level QE hierarchy.” Cineca is tiling data structures to efficiently use the computing power of each node. They changed fine-grain parallelism in the QE FFT module by refactoring data distribution using task groups as shown in Figure 4. The move to a many core model required changing QE code to make it fit the structure of a single node efficiently and splitting QE code into intra-node and inter-node processes. In their work with the new data layout, a single TILE of processes, inside a given taskgroup, contains all the G-vectors and subset of bands to compute full 3-D FFTs. The data tiling can be changed to best match the HPC system characteristic, to the limit (if node memory permits) of having a whole 3-D FFT performed by a single taskgroup locally to the node. The following example shows the results of a Car Parrinello simulation on a system of 32 water molecules. This plot shows the differences between the old implementation (blue) and the new one (red), enhancing a reduction of the time-to-solution. Different taskgroups distributions are shown in the plot. Simulations were obtained running on an Intel Xeon processor E5-2630 v3. Cavazzoni indicates, “The Intel exascale road-map allows for a smooth innovation path in the code, and a constant improvement of the performance and scalability. The availability of a large number of cores per node has made it possible to tune the different layers of parallelization. A good tiling of the different data structures permits us to efficiently tile the memory and computing power of each node, reducing the amount of communication and, thus, enhancing the performances. We changed the fine grain parallelism of QE and, in particular, the FFT module. Adopting different kind of data distribution (taskgroups) we achieved a good improvement in terms of performance (Figure 5). However, there is still room for improvement, in particular for the efficiency of the OpenMP multithreading that is now limited to 4-8 threads. This is because workloads that are too small can induce load unbalancing and then a large spinning time. Adopting OpenMP tasking strategies, we are expecting a considerable improvement of the shared memory parallelism based on the new task level parallelism which is implemented in OpenMP4. We have already done some tests that make us think that we can remove the bottleneck displayed by synchronous thread level parallelism.” The main focus of Lawrence Berkley National Lab (LBNL) working with the National Energy Research Scientific Computing Center (NERSC) is to advance the open-source quantum chemistry or materials codes on multicore high-performance computing systems. They are jointly optimizing a variety of codes including NWChem and Quantum ESPRESSO code. NERSC is a national supercomputing center that serves the supercomputing mission and data needs of the U.S. Department of Energy Office of Science. NERSC is part of the Lawrence Berkley National Laboratory adjacent to the University of California campus. NERSC is also experimenting with modifying Quantum ESPRESSO code, since it is one of the most commonly used codes on NERSC systems. According to Taylor Barnes, LBNL Hopper Fellow, “In particular, we are interested in improving the performance of hybrid Density Functional Theory (DFT) calculations within Quantum ESPRESSO. Hybrid DFT is often more accurate than other types of DFT, and can be especially important for performing simulations of systems like batteries and photovoltaic cells. Unfortunately, hybrid DFT is also much more computationally demanding and, thus, many of the calculations that we would like to perform are difficult or impossible to run on current machines.” One of the LBNL/NERSC strategies for improving the performance of hybrid calculations in Quantum ESPRESSO has been to refactor and modify the hybrid sections of the code. Barnes states, “In doing so, we have made significant changes to both the communication and parallelization strategies, leading to large improvements in the code’s strong scaling efficiency.” Another focus of the LBNL/NERSC efforts is the investigation of improved ways to handle the parallelization of the fast FFTs, which are an integral part of any calculation in Quantum ESPRESSO. “FFTs are notoriously difficult to parallelize efficiently across nodes; as a result, we are exploring strategies for distinguishing between intra-node parallelization of the FFTs using OpenMP and inter-node parallelization of other portions of the calculation using MPI. Our expectation is that these changes will be especially important on Intel Xeon Phi architectures,” indicates Barnes. How HPC will aid quantum materials research in the future Cineca, LBNL and NERSC all have a vision of how improved HPC code and Intel processors and coprocessors can improve the future of quantum materials research. The work these groups are doing to modify code to take advantage of HPC parallelization and optimization is especially important because there are not enough software engineers to adapt legacy codes. The work they are doing is being reviewed, and the optimization and parallelization modifications made by Cineca have been approved and incorporated into Release 5.3.0 of the Quantum ESPRESSO code. Both the LBNL and NERSC teams are active in the Intel Xeon Phi User's Group (IXPUG) and in the exchange of information and ideas to enhance the usability and efficiency of scientific applications running on large Intel Xeon Phi coprocessor-based high performance computing (HPC) systems. NERSC will be getting a large next-generation Intel Xeon Phi-processor-based supercomputer known as Cori late in 2016. NERSC has launched the NERSC Exascale Science Applications Program, which will allow 20 projects to collaborate with NERSC, Cray and Intel by providing access to early hardware, special training and preparation sessions. Project teams, guided by NERSC, Cray and Intel, will undertake intensive efforts to adapt software to take advantage of Cori's manycore architecture and to use the resultant codes to produce path-breaking science on an architecture that may represent an approach to exascale systems. Cavazzoni states, “In the context of the MaX project, we are committed to work on different codes from the community of material science in order to get ready for the exascale challenges. One of our main targets is to contribute to the modularization of such codes in order to build domain-specific libraries to be usable in different codes and/or complex workflows as LEGO blocks. This high degree of modularization will also allow our team to increase the performances and the suitability for new incoming architectures. In QE, we are already performing this work, and we recently packed all the functionalities related to the FFT kernels in a specific library. We are doing similar work for the linear algebra (such as diagonalization and eigenvalue problems) kernels. Together with MaX, we are also exploring new parallel paradigms and their possible usage in QE. In particular, we are interested in the tasking strategies implemented in the OpenMP standard. The advent of the Intel Xeon Phi architecture platforms gave us a strong motivation to increase the level of exposed parallelism in QE. Working on this aspect brings us much closer to the exascale scalability. The Intel Xeon Phi architecture clearly tells us that what will make the difference is the ability to use the shared memory paradigm and node resources best. We need to allow the allocation of a single MPI task per socket, where the best ratio today for MPI/threads is 1/2, 1/4, quite unlikely 1/8, and nothing above. We should improve the shared memory efficiency to have the possibility to use MPI to threads ratio in the order of 1 to 32 at least. And this will be valuable for any architecture, not only for the Intel Xeon Phi processor. All these enhancements will be soon tested on the upcoming Intel Xeon Phi processors that will be available this year in new supercomputers.” Other articles in this series covering the modernization of popular chemistry codes include: Linda Barney is the founder and owner of Barney and Associates, a technical/marketing writing, training and web design firm in Beaverton, OR. R&D 100 AWARD ENTRIES NOW OPEN: Establish your company as a technology leader! For more than 50 years, the R&D 100 Awards have showcased new products of technological significance. You can join this exclusive community! .
News Article | November 12, 2015
Dark matter is one of the basic ingredients of the universe, and searches to detect it in laboratory-based experiments have been conducted for decades. However, until today dark matter has been observed only indirectly, via its gravitational interactions that govern the dynamics of the cosmos at all length-scales. It is expected that dark matter is made of a new, stable elementary particle that has escaped detection so far. "We expect that several tens of thousands of dark matter particles per second are passing through the area of a thumbnail," said Luca Grandi, a UChicago assistant professor in physics and a member of the Kavli Institute for Cosmological Physics. "The fact that we did not detect them yet tells us that their probability to interact with the atoms of our detector is very small, and that we need more sensitive instruments to find the rare signature of this particle." Grandi is a member of the XENON Collaboration, which consists of 21 research groups from the United States, Germany, Italy, Switzerland, Portugal, France, the Netherlands, Israel, Sweden and the United Arab Emirates. The collaboration's inauguration event took place Nov. 11 at the Laboratori Nazionali del Gran Sasso, one of the largest underground laboratories in the world. "We need to put our experiment deep underground, using about 1,400 meters of solid rock to shield it from cosmic rays," said Grandi, who participated in the inauguration along with guests from funding agencies as well as journalists and colleagues. About 80 visitors joined the ceremony at the laboratory's experimental site, which measures 110 meters long, 15 meters wide and 15 meters high. There, the new instrument is installed inside a 10-meter-diameter water shield to protect it from radioactive background radiation that originates from the environment. During introductory presentations, Elena Aprile, Columbia University professor and founder of the XENON project, illustrated the evolution of the program. It began with a 3 kilogram detector 15 years ago. The present-day instrument has a total mass of 3,500 kilograms. XENON1T employs the ultra-pure noble gas xenon as dark matter detection material, cooled down to –95 degrees Celsius to make it liquid. "In order to see the rare interactions of a dark matter particle in your detector, you need to build an instrument with a large mass and an extremely low radioactive background," said Grandi. "Otherwise you will have no chance to find the right events within the background signals." For this reason, the XENON scientists have carefully selected all materials used in the construction of the detector, ensuring that their intrinsic contamination with radioactive isotopes meet the low-background experiment's requirement. "One has to realize that objects without any radioactivity do not exist," Grandi explained. "Minute traces of impurities are present in everything, from simple things like metal slabs to the walls of the laboratory to the human body. We are trying to reduce and control these radioactive contaminants as much as possible." The XENON scientists measure tiny flashes of light and charge to reconstruct the position of the particle interaction within their detector, as well as the deposited energy and whether it might be induced by a dark matter particle or not. The light is observed by 248 sensitive photosensors, capable of detecting even single photons. A vacuum-insulated double-wall cryostat, resembling a gigantic version of a thermos flask, contains the cryogenic xenon and the dark matter detector. The xenon gas is cooled and purified from impurities in the three-story XENON building, an installation with a transparent glass facade next to the water shield, which allows visitors to view the scientists inside. A gigantic stainless-steel sphere equipped with pipes and valves is installed on the ground floor. "It can accommodate 7.6 tons of xenon in liquid and gaseous form," said Aprile. "This is more than two times the capacity we need for XENON1T, as we want to be prepared to swiftly increase the sensitivity of the experiment with a larger mass detector in the near future." Once fully operational, XENON1T will be the most sensitive dark matter experiment in the world. Grandi's group has been deeply involved in the preparation and assembly of the xenon Time Projection Chamber, the core of the detector. His group is also in charge for the development of the U.S. computing center for XENON1T data analysis via the UChicago Research Computing Center, directed by Birali Runesha, in close cooperation with Robert Gardner and his team at the Computation Institute. In addition to Columbia's Aprile, leading the other six U.S. institutions are Ethan Brown, Rensselaer Polytechnic Institute; Petr Chaguine, Rice University; Rafael Lang, Purdue University; Kaixuan Ni, University of California, San Diego; and Hanguo Wang, University of California, Los Angeles. XEON1T's first results are expected in early 2016. The collaboration expects the instrument to achieve most of its objectives within two years of data collection. The researchers then will move their project into a new phase. "Of course we want to detect the dark matter particle," Grandi said, "but even if we have only found some hints after two years, we are in an excellent position to move on as we are already now preparing the next step of the project, which will be the far more sensitive XENONnT."
News Article | November 19, 2015
A Florida State University high performance computing researcher has predicted a physical effect that would help physicists and astronomers provide fresh evidence of the correctness of Einstein’s general theory of relativity. Bin Chen, who works at the university’s Research Computing Center, describes the yet-to-be-observed effect in the paper “Probing the Gravitational Faraday Rotation Using Quasar X-ray Microlensing,” published November 17, 2015, in the journal Scientific Reports. “To be able to test general relativity is of crucial importance to physicists and astronomers,” Chen said. This testing is especially so in regions close to a black hole, according to Chen, because the current evidence for Einstein’s general relativity — light bending by the sun, for example — mainly comes from regions where the gravitational field is very weak, or regions far away from a black hole. Electromagnetism demonstrates that light is composed of oscillating electric and magnetic fields. Linearly polarized light is an electromagnetic wave whose electric and magnetic fields oscillate along fixed directions when the light travels through space. The gravitational Faraday effect, first predicted in the 1950s, theorizes that when linearly polarized light travels close to a spinning black hole, the orientation of its polarization rotates according to Einstein’s theory of general relativity. Currently, there is no practical way to detect gravitational Faraday rotation. In the paper, Chen predicts a new effect that can be used to detect the gravitational Faraday effect. His proposed observation requires monitoring the X-ray emissions from gravitationally lensed quasars. “This means that light from a cosmologically distant quasar will be deflected, or gravitationally lensed, by the intervening galaxy along the line of sight before arriving at an observer on the Earth,” said Chen of the phenomenon of gravitational lensing, which was predicted by Einstein in 1936. More than 100 gravitational lenses have been discovered so far. “Astronomers have recently found strong evidence showing that quasar X-ray emissions originate from regions very close to supermassive black holes, which are believed to reside at the center of many galaxies,” Chen said. “Gravitational Faraday rotation should leave its fingerprints on such compact regions close to a black hole. “Specifically, the observed X-ray polarization of a gravitationally microlensed quasar should vary rapidly with time if the gravitational Faraday effect indeed exists,” he said. “Therefore, monitoring the X-ray polarization of a gravitationally lensed quasar over time could verify the time dependence and the existence of the gravitational Faraday effect.” If detected, Chen’s effect — a derivative of the gravitational Faraday effect — would provide strong evidence of the correctness of Einstein’s general relativity theory in the “strong-field regime,” or an environment in close proximity to a black hole. Chen generated a simulation for the paper on the FSU Research Computing Center’s High-Performance Computing cluster — the second-largest computer cluster in Florida.
News Article | March 30, 2016
Today’s installment is the third in a series covering how researchers from national laboratories and scientific research centers are updating popular molecular dynamics, quantum chemistry and quantum materials code to take advantage of hardware advances, such as the next-generation Intel Xeon Phi processors. Georgia Institute of Technology, known as Georgia Tech, is an Intel Parallel Computing Center (Intel PCC) that focuses on modernizing the performance and functionality of software on advanced HPC systems used in scientific discovery. Georgia Tech developed a new HPC software package, called GTFock, and the SIMINT library to make quantum chemistry and materials simulations run faster on servers and supercomputers using Intel Xeon processors and Intel Xeon Phi coprocessors. These tools, which continue to be improved, provide an increase in processing speed over the best state-of-the-art quantum chemistry codes in existence. “GTFock and SIMINT allow us to perform quantum chemistry simulations faster and with less expense, which can help in solving large-scale problems from fundamental chemistry and biochemistry to pharmaceutical and materials design,” states Edmond Chow, Associate Professor of Computational Science and Engineering and Director of the Georgia Institute of Technology Intel PCC. The Intel PCC at Georgia Tech has been simulating the binding of the drug Indinavir with human immunodeficiency virus (HIV) II protease. Indinavir is a protease inhibitor that competitively binds to the active site of HIV II protease to disrupt normal function as part of HIV treatment therapy. Such systems are too large to study quantum mechanically, so only a part of the protease closest to the drug is typically simulated. The aim of the work at Georgia Tech is to quantify the discrepancy in the binding energy when such truncated models of the protease are used. To do this, simulations with increasing larger portions of the protease are performed. These are enabled by the GTFock code, developed at the Georgia Tech Intel PCC in collaboration with Intel, which has been designed to scale efficiently on large cluster computers, including Intel Many Integrated Core (MIC) architecture clusters. Calculations were performed at the Hartree-Fock level of theory. The largest simulations included residues of the protease more than 18 Angstroms away from the drug molecule. These simulations involved almost 3000 atoms and were performed on more than 1.6 million compute cores of the Tianhe-2 supercomputer (an Intel Xeon processor and Intel Xeon Phi processor-based system that is currently number one on the TOP500 list). The results of this work so far show variations in binding energy that persist throughout the range up to 18 Angstroms. This suggests that at even relatively large cutoff distances, leading to very large model complexes (much larger than are typically possible with conventional codes and computing resources), the binding energy is not converged to within chemical accuracy. Further work is planned to validate these results as well as to study additional protein-ligand systems. New quantum chemistry code: GTFock The GTFock code was developed by the Georgia Tech Intel PCC in conjunction with the Intel Parallel Computing Lab. GTFock addresses one of the main challenges of quantum chemistry, which is the ability to run more accurate simulations and simulations of larger molecules through exploiting distributed memory processing. GTFock was designed as a new toolkit with optimized and scalable code for Hartree-Fock self-consistent field iterations and the distributed computation of the Fock matrix in quantum chemistry. The Hartree-Fock (HF) method is the one of most fundamental methods in quantum chemistry for approximately solving the electronic Schrödinger equation. The solution of the equation, called the wavefunction, can be used to determine properties of the molecule. Georgia Tech’s goals in the code design of GTFock include scalability to large numbers of nodes and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. GTFock also includes infrastructure for performing self-consistent field (SCF) iterations to solve for the Hartree-Fock approximation and uses a new distributed algorithm for load balancing and reducing communication. GTFock code can be integrated into existing quantum chemistry packages and can be used for experimentation as a benchmark for high-performance computing. The code is capable of separately computing the Coulomb and exchange matrices and, thus, can be used as a core routine in many quantum chemistry methods. As part of IPCC collaborations, Georgia Tech graduate student Xing Liu and Intel researcher Sanchit Misra spent a month in China optimizing and running GTFock on Tianhe-2. During testing, the team encountered scalability problems when scaling up the code to 8100 nodes on Tianhe-2. They resolved these issues by using a better static partitioning and a better work stealing algorithm than used in previous work. They utilized the Intel Xeon Phi coprocessors on Tianhe-2 by using a dedicated thread on each node to manage offload to coprocessors and to use work stealing to dynamically balance the work between CPUs and coprocessors. The electron repulsion integral (ERI) calculations were also optimized for modern processors including the Intel Xeon Phi coprocessor. The partitioning framework used in GTFock is useful for comparing existing and future partitioning techniques. The best partitioning scheme may depend on the size of the problem, the computing system used and the parallelism available. In Fock matrix construction, each thread sums to its own copy of Fock submatrices in order to avoid contention for a single copy of the Fock matrix on a node. However, accelerators including Intel Xeon Phi coprocessors have limited memory per core, making this strategy impossible for reduction across many threads. Thus, novel solutions had to be designed. Figure 2 shows speed up results from running the GTFock code. A deficiency in quantum chemistry codes that Georgia Tech saw had to be addressed is the bottleneck of computing quantities called electron repulsion integrals. This calculation is a very computationally intensive step: there are many of these integrals to calculate and these calculations do not run efficiently on modern processors, including the Intel Xeon processor. One of the reasons is that the existing codes do not take advantage of single instruction, multiple data (SIMD) processing that is available on these processors. It is difficult for algorithms to exploit SIMD operations because of the structure of the algorithms. The existing algorithms that are used are recursive in multiple dimensions and require substantial amounts of intermediate data. In general, it is difficult to vectorize these calculations. Many attempts in the past involved taking existing libraries and rearranging code elements to try to optimize and speed up the calculations. The Georgia Tech team felt it was necessary to create a new library for electron integral calculations from scratch. The library they created is called SIMINT, which means Single Instruction Multiple Integral (named by SIMINT library developer Ben Pritchard). This library applies SIMD instructions to compute multiple integrals at the same time, which is the efficient mode of operation of Intel Xeon processors as well as the Intel Xeon Phi microarchitecture (MIC), which has wide SIMD units. SIMINT is a library for calculating electron repulsion integrals. The Georgia Tech PCC team designed it to use the SIMD features of Intel Xeon processors — it is highly efficient and faster than other state-of-the-art ERI codes. The approach is to use horizontal vectorization; thus, you must compute batches of integrals of the same type together. The Georgia Tech team has posted information so that users can take a look. The team uses Intel VTune amplifier extensively in optimizing SIMINT, because it helps tune the vectorization and cache performance. Developers know how fast the processor can go and the speed limits of the calculation because of the instructions they need to perform. Intel VTune amplifier provides a variety of statistics at a line of code level that help determine why they may not be reaching the expected performance. Figure 3 shows an approximate 2x speedup over libint with a test case that has many worst-case configurations. Figure 4 shows a 3x speedup for another basis set without worst-case configurations. “SIMINT has been designed specifically to efficiently use SIMD features of Intel processors and co-processors. As a result, we’re already seeing speedups of 2x to 3x over the best existing codes.” Edmond Chow, Associate Professor of Computational Science and Engineering and Director of the Georgia Institute of Technology Intel PCC. “GTFock has attracted the attention of other developers of quantum chemistry packages. We have already integrated GTFock into PSI4 to provide distributed memory parallel capabilities to that package. In addition, we have exchanged visits with the developers of the NWChem package to initiate integration of GTFock into NWChem (joint work with Edo Apra and Karol Kowalski, PNNL). Along with SIMINT, we hope to help quantum chemists get their simulations — and their science — done faster,” states Chow. Other articles in this series covering the modernization of popular chemistry codes include: Linda Barney is the founder and owner of Barney and Associates, a technical/marketing writing, training and web design firm in Beaverton, OR. R&D 100 AWARD ENTRIES NOW OPEN: Establish your company as a technology leader! For more than 50 years, the R&D 100 Awards have showcased new products of technological significance. You can join this exclusive community! .
News Article | January 20, 2016
The Texas Advanced Computing Center (TACC) at The University of Texas at Austin (UT Austin) announced that the Lonestar 5 supercomputer is in full production and is ready to contribute to advancing science across the state of Texas. Managed by TACC, the center's second petaflop system is primed to be a leading computing resource for the engineering and science research community. The supercomputer is sponsored by UT System in partnership with UT Austin, Texas Tech University, Texas A&M University, and the Institute for Computational Engineering and Sciences (ICES) and the Center for Space Research at The University of Texas at Austin. The technology partners are Cray, Intel and DataDirect Networks. Lonestar 5 is designed for academic researchers, serving as the primary high performance computing resource in the UT Research Cyberinfrastructure (UTRC) initiative. Sponsored by The University of Texas System (UT System), UTRC provides a combination of advanced computational systems, a large data storage opportunity, and high bandwidth data access. UTRC enables researchers within all 14 UT System institutions to collaborate with each other and compete at the forefront of science and discovery. The new Lonestar 5 Cray XC40 supercomputer, which contains more than 30,000 Intel Xeon processing cores from the E5-2600 v3 product family, provides a peak performance of 1.25 petaflops. With 24 processing cores per compute node, Lonestar 5 follows the trend of more cores per node that the industry sees in every generation of microprocessors. The system is the fifth in a long line of systems available for Texas researchers, dating back over 15 years to the original Lonestar 1 system (also a Cray). The system will continue to serve its mainstay user communities with an emphasis on addressing a wide variety of research areas in engineering, medicine and the sciences. A number of researchers have been using Lonestar 5 in an "early user" mode over the last few months. Researchers from UT System institutions and contributing partners wishing to request access to Lonestar 5 should do so via the TACC User Portal.
News Article | December 2, 2015
The creation and nurturing of supercomputing capabilities has become a national strategic priority for countries all over the world, including the Czech Republic, which — as of September 2015 — houses the largest installation of an Intel Xeon Phi-accelerated cluster in Europe. The Salomon supercomputer is located at IT4Innovations, the Czech Republic’s National Supercomputing Center at VSB-Technical University of Ostrava. Ranked 48th on the TOP500 list of the world’s most powerful supercomputers (the Czech Republic’s first appearance in the 22-year history of the list), and 14th in Europe, the new system supports the scientific and industrial/manufacturing HPC needs of academic research institutions and businesses across the country. In short, the system puts the Czech Republic on the supercomputing map. Salomon is based on the SGI ICE X system and has 1,008 nodes containing 2,016 Intel Xeon processors E5-2680, a total of 24,192 cores. They are supported by 864 Intel Xeon Phi coprocessors, with 52,704 cores, and 13.8 terabytes of RAM in 432 accelerated nodes. With a theoretical computing performance of two petaflops, Salomon exceeds IT4Innovations’ previous supercomputer by more than 20 times. “We wanted our new system to be X86-based because of the large number of applications that run on the X86 architecture. Further, IT4Innovations is an Intel Parallel Computing Center (IPCC) and receives strong support from Intel," said Dr. Branislav Jansík, Head of Supercomputing Services at IT4Innovations. “Intel Xeon Phi processors are capable of delivering huge performance with a small footprint and energy cost. Scientific applications benefit from the automatic offload of the BLAS (Basic Linear Algebra Subprograms) functions, which pays off for large problem sizes.” “The applications deployed at IT4Innovations that benefit most from the Intel Xeon Phi processor are ESPRESSO, BEM4I, Blender and SeiSol,” Jansik continued as he explained the applications: “We also plan to deploy a GPAW Intel Xeon Phi processor acceleration code developed at CSC Finland, used for electronic structure calculations in materials engineering,” he said. “Additionally, a number of other applications benefit indirectly, via the processor’s automatic offload. This includes the R statistical package and Octave, an open source language compatible to MATLAB.” Jansik added that “along with SGI, Intel has been tremendously helpful in optimizing the system for our application needs.” The unveiling of Salomon is the culmination, to date, of a lengthy effort by the Czech Republic to build a world-class supercomputing capability. Prior to the creation of IT4Innovations in 2011, the country relied largely on grid computing infrastructure and HPC capacity from other European countries, hampering research work. In 2013, the center introduced its first supercomputer, Anselm, which supported hundreds of projects, but demand for cycles and greater compute power soon outpaced capacity. Salomon supports hundreds of scientists and engineers in the Czech Republic conducting more than 300 research projects across the fields of materials science, bio-science, cosmology, astronomy, structural mechanics of liquids, geophysics, climatology, molecular modeling, plasma and particle physics, disease discovery, computer science and applied mathematics. Projects include RODOS, providing complex solutions for modeling, managing and optimizing transport; and Floreon+, which is used to model and predict flooding in the Moravian-Silesian region of the Czech Republic. “The increased power of the supercomputer enables the users to scale their ambition and study much more complex problems,” Jansik said. “We see this trend across the entire spectrum of the scientific disciplines, with most requests for computer time increased by an order of magnitude.” In order to fully leverage the computer power of Salomon, the IPCC at IT4Innovations is focused on “code modernization,” an effort joined by IPCCs around the world to optimize HPC applications to run across hundreds, or thousands, of Intel Xeon Phi processor-accelerated nodes. The IT4I Center develops highly parallel algorithms and application libraries focused on state-of-the-art sparse iterative linear solvers. It also works on the vectorization of HPC community codes, such as Elmer and Open FOAM. IT4I is part of PRACE, Partnership for Advanced Computing in Europe, an international network of supercomputer centers that supports high-impact scientific discovery and engineering research and development to enhance European competitiveness. Early reports from the Salomon user community are strongly positive, Jansik said. The system had a 90 percent subscription rate before it was delivered, with nearly 50 million core hours of compute time requested. "The scientific community in the Czech Republic gained a premium scientific tool, and I believe that, in the medium and long term, it will benefit not only the research community but also the industry and the Czech economy," said Martin Palkovič, Director, IT4Innovations.
News Article | November 17, 2015
Bin Chen, who works at the university's Research Computing Center, describes the yet-to-be-observed effect in the paper "Probing the Gravitational Faraday Rotation Using Quasar X-ray Microlensing," published today in the journal Scientific Reports. "To be able to test general relativity is of crucial importance to physicists and astronomers," Chen said. This testing is especially so in regions close to a black hole, according to Chen, because the current evidence for Einstein's general relativity—light bending by the sun, for example—mainly comes from regions where the gravitational field is very weak, or regions far away from a black hole. Electromagnetism demonstrates that light is composed of oscillating electric and magnetic fields. Linearly polarized light is an electromagnetic wave whose electric and magnetic fields oscillate along fixed directions when the light travels through space. The gravitational Faraday effect, first predicted in the 1950s, theorizes that when linearly polarized light travels close to a spinning black hole, the orientation of its polarization rotates according to Einstein's theory of general relativity. Currently, there is no practical way to detect gravitational Faraday rotation. In the paper, Chen predicts a new effect that can be used to detect the gravitational Faraday effect. His proposed observation requires monitoring the X-ray emissions from gravitationally lensed quasars. "This means that light from a cosmologically distant quasar will be deflected, or gravitationally lensed, by the intervening galaxy along the line of sight before arriving at an observer on the Earth," said Chen of the phenomenon of gravitational lensing, which was predicted by Einstein in 1936. More than 100 gravitational lenses have been discovered so far. "Astronomers have recently found strong evidence showing that quasar X-ray emissions originate from regions very close to supermassive black holes, which are believed to reside at the center of many galaxies," Chen said. "Gravitational Faraday rotation should leave its fingerprints on such compact regions close to a black hole. "Specifically, the observed X-ray polarization of a gravitationally microlensed quasar should vary rapidly with time if the gravitational Faraday effect indeed exists," he said. "Therefore, monitoring the X-ray polarization of a gravitationally lensed quasar over time could verify the time dependence and the existence of the gravitational Faraday effect." If detected, Chen's effect—a derivative of the gravitational Faraday effect—would provide strong evidence of the correctness of Einstein's general relativity theory in the "strong-field regime," or an environment in close proximity to a black hole. Chen generated a simulation for the paper on the FSU Research Computing Center's High-Performance Computing cluster—the second-largest computer cluster in Florida. More information: Bin Chen. Probing the gravitational Faraday rotation using quasar X-ray microlensing, Scientific Reports (2015). DOI: 10.1038/srep16860