Entity

Time filter

Source Type

Beijing, China

Chen X.,Chinese Academy of Sciences | Chen X.,University of Chinese Academy of Sciences | Zhang G.,Huawei | Wang H.,Loongson Corporation | And 6 more authors.
Proceedings -Design, Automation and Test in Europe, DATE | Year: 2015

Facing the speed bottleneck of software-based simulators, FPGA-based simulation has been explored more and more. This paper proposes a novel methodology to simulate a chip-multiprocessor (CMP) on the limited FPGA resource. By mixing real cores and pseudo cores together (MRP), we can simulate a multicore system with fewer FPGA resource requirements and achieve a much higher simulation speed. We propose several methods to construct the pseudo cores. We implement our idea on a dual Virtex-6 FPGA board to simulate a general-purpose 4-core high performance CMP processor. Comparison experiments against the corresponding tape-out chip prove the effectiveness of MRP. We also evaluate MRP prototype's performance by running SPEC CPU2006 benchmarks on an unmodified Linux operating system, achieving tens to hundreds speedup compared to two other commonly-used simulators. © 2015 EDAA. Source


Chen L.,Chinese Academy of Sciences | Chen L.,CAS Institute of Computing Technology | Chen L.,University of Chinese Academy of Sciences | Chen L.,Loongson Corporation | And 13 more authors.
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2013

Based on the memory accessing analysis for the synchronization process of chip multiprocessors (CMPs), a method to recognize the type of synchronization was proposed, and a novel cache coherence protocol for optimization of the synchronization of CMPs was designed. The protocol adds a new cache state for synchronization information, and it can make CMPs realize the serial synchronization operation with the way of blocking to guarantee successful execution of atomic operations. Thus, the number of memory accesses caused by synchronization conflicts can be greatly reduced, and an almost perfect memory accessing process in synchronization can be achieved. The experimental results show that, with the proposed cache coherence protocol, the synchronization performance is almost perfect. Compared with the traditional cache coherence, the synchronization performance can be increared by 100%, and the whole execution time of parallel programs can be reduced by 25%. Source


Zhang X.,Chinese Academy of Sciences | Zhang X.,CAS Institute of Computing Technology | Zhang X.,University of Chinese Academy of Sciences | Zhang X.,Loongson Corporation | And 13 more authors.
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2014

The consistency maintenance for address mapping during indirect branch handling in a dynamic binary translation (DBT) system was studied, and a novel approach to optimization of the consistency maintenance was proposed based on the analysis of the traditional lock mechanism based consistency maintenance scheme's major shortcoming of causing great overhead both in singlethreaded and multithreaded execution. The new method avoids lock operations during the hot branch handling through tracing the hotspot of the indirect branches, and operates redundant address mapping when read-write conflicts are detected. For the detection, a dedicated mechanism was designed to organize the timing sequence of instructions and the address mapping data. The final results of the experiments on the Godson-3 platform emulating the X86 architecture, show that the proposed approach can reduce the execution overhead by 27.7% on average (1.8% to 58.5%) for singlethreaded benchmarks, and by 18.4% on average (3.3% to 64.6%) for multithreaded benchmarks. Source


Liu H.,Chinese Academy of Sciences | Liu H.,CAS Institute of Computing Technology | Liu H.,University of Chinese Academy of Sciences | Liu H.,Loongson Corporation | And 12 more authors.
Proceedings - 2013 IEEE 8th International Conference on Networking, Architecture and Storage, NAS 2013 | Year: 2013

Heterogeneous multi-core processors have strong potential for performance improvement, energy efficiency and area efficiency, compared to the homogeneous multi-core processors. The present methods of execution migration for heterogeneous multi-core processor suffer in efficiency, cost, compatibility, or programmability. In this paper, we propose a HW/SW code sign migration method based on binary-instrumentation. Our method takes full advantage of the shared-ISA. It enhances the performance of heterogeneous chip multiprocessor with low HW/SW cost. And it's not required to modify source codes or compile system. The experiment results show that the efficiency of our method is 3.29 times of kernel simulation. © 2013 IEEE. Source


Liu H.,Chinese Academy of Sciences | Liu H.,CAS Institute of Computing Technology | Liu H.,University of Chinese Academy of Sciences | Liu H.,Loongson Corporation | And 10 more authors.
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2014

The characteristics of the excution migration in heterogeneous multi-core processors were analyzed, and a binary-instrumentation based execution migration method for homogeneous multi-core processors was put forward to solve the drawbacks of the present methods for execution migration between shared ISA heterogeneous multi-cores in efficiency, cost, compatibility, or programmability. The proposed migration method based on binary-instrumentation can take full advantage of shared-ISA heterogeneous multi-core to enhance the performance of heterogeneous chip multiprocessors with low cost. And it need not to modify the source code or the compile system. The experimental results obtained from the test on the SPEC procedure showed that its run-time efficiency was 2.25 times of kernel simulation. Source

Discover hidden collaborations