Entity

Time filter

Source Type


Li Q.,National Research Center for Intelligent Computing Systems | Li Q.,Chinese Academy of Sciences | Li Q.,University of Chinese Academy of Sciences | Huo Z.,National Research Center for Intelligent Computing Systems | And 3 more authors.
Proceedings - 9th International Conference on Grid and Cloud Computing, GCC 2010 | Year: 2010

Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of HPP-PHPC is ensured by special processors integrated with vector units and high-efficiency interconnection between heterogeneous processors. And by the adoption of general processors and features like global physical address space and synchronization semantics in hardware, HPP-PHPC is more compatible and convenient for massage passing and PGAS programming model. Also it is more applicable to most applications, including those with many execution branches. Initial results obtained from our prototype system have proved our design. © 2010 IEEE. Source


Li Q.,National Research Center for Intelligent Computing Systems | Li Q.,Chinese Academy of Sciences | Li Q.,University of Chinese Academy of Sciences | Li B.,National Research Center for Intelligent Computing Systems | And 6 more authors.
Frontiers of Computer Science in China | Year: 2010

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connects two kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intranode layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, a special collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient. © 2010 Higher Education Press and Springer-Verlag Berlin Heidelberg. Source

Discover hidden collaborations