Loongson Technology Corporation Ltd

Beijing, China

Loongson Technology Corporation Ltd

Beijing, China
Time filter
Source Type

News Article | April 27, 2017
Site: www.businesswire.com

BEAVERTON, Ore.--(BUSINESS WIRE)--The Unified Extensible Firmware Interface (UEFI) Forum held its Spring 2017 UEFI Seminar and Plugfest last month in Nanjing, China, hosting firmware thought leaders and testing the latest technology—indicating high interest in UEFI in China’s technology market. The UEFI Forum, a non-profit industry standards body responsible for developing, managing and promoting UEFI specifications, co-hosted the event with Byosoft Co., LTD, along with the sponsorship of American Megatrends Inc., ARM, Huawei, Insyde Software, Intel Corporation, Nanjing High-Tech Zone and Phoenix Technologies. The Spring 2017 UEFI Seminar and Plugfest occurred on March 27-31 at the Wanda Hilton Hotel in Nanjing, China and provided firmware thought leaders, developers and researchers an opportunity to learn about the latest firmware innovations and test UEFI platforms and devices. The keynote titled “UEFI Has Promoted the Development of China Computer Industry” was presented by Guangnan Ni from the Chinese Academy of Engineering and a critical force in landing UEFI in China. In anticipation of the first China-based UEFI event in ten years, over 20 new members in China joined the UEFI Forum—indicating significant interest in UEFI technology in the greater China region. Additionally, in attendance from the region were prominent member companies including H3C and Inspur, Lenovo, Loongson Technology Corporation Limited, and Sugon. While the Chinese market has widely adopted UEFI technology in traditional devices like PCs and servers and used it for CPU and OS development, the Forum continues to witness UEFI adoption into IoT, virtual reality, storage, cloud-computing and datacenter platforms. “This event encouraged increased adoption of UEFI as an enabling technology that meets China’s interest in supporting indigenous technologies,” Mark Doran, UEFI Forum President, said. “UEFI is a common framework; it’s a bridge between different market segments and architectures. Ultimately, UEFI creates new opportunities for business and developers and encourages the continued international evolution of the open source community.” Through a collaborative approach with world-class companies, institutions and experts, the UEFI Forum advances innovation in firmware technology standards. These extensible, globally-adopted UEFI specifications bring new functionality and enhanced security to the evolution of devices, firmware and operating systems. The Forum also collaborates with other standards groups that are essential to computing. For more information about the UEFI Forum and current specifications go to www.uefi.org.

Liu Z.,CAS Institute of Computing Technology | Liu Z.,University of Chinese Academy of Sciences | Wang J.,CAS Institute of Computing Technology | Wang J.,University of Chinese Academy of Sciences | And 3 more authors.
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2017

To reduce the delay and the arrival time difference of a leading-one-detector circuit, a sparse-tree based leading-one-detector structure and its dynamic circuit implementation method were presented. Two new kinds of logical Boolean operations were defined by recursive expression to form the nodes of the sparse tree. The precalculated clock's skew in the pre-charge stage can be accurately controlled, so the leakage power and the arrival time difference can be simultaneously controlled. In consideration of the various numbers of input ports, the extending network by connecting same spot units shortens the time cost on designing. There is less logic level and fan-out in the structure presented. The whole circuit was verified by the pseudo random test vector. This technique could reduce the critical path length by 20% and keep the arrival time in 1 picosecond. © 2017, Executive Office of the Journal. All right reserved.

Chen L.,Chinese Academy of Sciences | Wang Y.,Chinese Academy of Sciences | Wang H.,Chinese Academy of Sciences | Wang H.,CAS Institute of Computing Technology | And 8 more authors.
Future Generation Computer Systems | Year: 2016

As the number of cores in chip multiprocessors (CMPs) increases rapidly, network-on-chips (NoCs) have become the major role in ensuring performance and power scalability. In this paper, we propose multiple-combinational-channel (MCC), a load balancing and deadlock free interconnect network, for cache-coherent non-uniform memory accessing (CC-NUMA). In order to make load more balancing and reduce power dissipation, we combine low usage channels and make high usage channels independent and wide enough, since messages transmitted over NoC have different widths and injection rates. Furthermore, based on the in-depth analysis of network traffic, we summarize four traffic patterns and establish several rules to avoid protocol-level deadlocks. We implement MCC on a 16-core CMPs, and evaluate the workload balance, area, power and performance using universal workloads. The experimental results show that MCC reduces nearly 21% power than multiple-physical-channel with similar throughput. Moreover, MCC improves 10% performance with similar area and power, compared to packet-switching architecture with virtual channels. © 2015 Elsevier B.V.

Fu J.,University of Chinese Academy of Sciences | Fu J.,CAS Institute of Computing Technology | Jin G.,Loongson Technology Corporation Ltd | Zhang L.,CAS Institute of Computing Technology | Wang J.,CAS Institute of Computing Technology
2016 ACM International Conference on Computing Frontiers - Proceedings | Year: 2016

Dynamic compilation has a great impact on the performance of virtual machines. In this paper, we study the features of dynamic compilation and then unveil objectives for optimizing dynamic compilation systems. Following these objectives, we propose a novel dynamic compilation scheduling algorithm called combined analysis with online sifting (CAOS). It consists of a combined priority analysis model and an online sifting mechanism. The combined priority analysis model is used to determine the priority of methods while scheduling, aiming at reconciling responsiveness with the average delay of compilation queue. By performing online sifting, runtime overhead can be further reduced since methods with little benefit to performance are sifted out. CAOS can significantly improve the startup performance of applications. Experimental results show that CAOS achieves 14.0% improvement of startup performance on average, and the highest performance boost is up to 55.1%. With the virtue of high versatility and easy implementation, CAOS can be applied to most dynamic compilation systems.

Wu R.,Chinese Academy of Sciences | Wu R.,University of Chinese Academy of Sciences | Wu R.,Loongson Technology Corporation Ltd | Tai Y.,Oracle Inc.
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2015

The cost of virtual machines' exit and restoration was studies, and a method based on delay storing was proposed to reduce the cost of saving and restoring registers when virtual machines exit or resume. The main mechanism of the method is to reduce the amount of registers to be saved and restored by changing the source code of the virtual machine software and judging whether the virtual machine in resuming is still the same one that exited last time. The proposed method needs no hardware change, and supports multicore operating systems and concurrent operation of multiple virtual machines, leading to a wide applicability. The results of the test conducted on the Loongson 3A1500 platform demonstrated that the cost of virtual machine exit of the proposed method was reduced by 65% compared with the existing method, and the performance of the whole virtual machine was increase by 3% to 10%. ©, 2015, Inst. of Scientific and Technical Information of China. All right reserved.

Fu J.,University of Chinese Academy of Sciences | Fu J.,CAS Institute of Computing Technology | Jin G.,Loongson Technology Corporation Ltd | Zhang L.,CAS Institute of Computing Technology | Wang J.,CAS Institute of Computing Technology
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2016

To reduce the overhead caused by instruction dispatch to improve the performance of interpreters, an instruction dispatch approach based on hardware and software co-design is proposed. Its main idea is to eliminate the expensive operation of constant address loading by optimizing the instruction dispatch table in the aspect of sofware, and to acceleratethe speed of memory access under the support of hardware by enhancing the processor's instruction set in the aspect of hardware. The hardware-software co-design can minimize the runtime overhead of instruction dispatch, thus improving the performance of interpreters. The experimental results showed that the proposed approach significantly improved the performance of interpreters. For benchmarks of SPECjvm98 and DaCapo, the overall performance of interpreters was improved by 11.5%, and the highest performance boost was up to 15.4%. The approach is highly versatile, easy to implement and can be applied to the design and implementation of high performance interpreters on mainstream processors. © 2016, Inst. of Scientific and Technical Information of China. All right reserved.

Zeng L.,CAS Institute of Computing Technology | Zeng L.,University of Chinese Academy of Sciences | Li P.,CAS Institute of Computing Technology | Li P.,University of Chinese Academy of Sciences | Wang H.,Loongson Technology Corporation Ltd
Gaojishu Tongxin/Chinese High Technology Letters | Year: 2016

The Cache compression was studied to increase Cache's effective capacity, and a region cooperative compression (RCC) algorithm was proposed to improve the compression ratio of the last level Cache. Different to traditional Cache compression algorithms, the RCC algorithm exploits the compression locality to compress Cache blocks in a Cache region by the cooperation of the first block in the region, instead of compressing the whole Cache region. RCC effectively explores the duplications across the Cache blocks in a Cache region and shows a comparable compression ratio with dictionary compression approaches with the whole Cache region as the compression granularity, whereas the (de)compression latency is not increased. The experimental results show that RCC provides the better average compression ratio than the compression algorithm of C-PACK by 12.34%, which causes the performance improvement of 5%. Compared to the non-compressive Cache with double size, the effective capacity increases by 27%, the performance increases by 8.6% and the area decreases by 63.1%. © 2016, Inst. of Scientific and Technical Information of China. All right reserved.

Hu W.,Chinese Academy of Sciences | Yang L.,Loongson Technology Corporation Ltd | Fan B.,Loongson Technology Corporation Ltd | Wang H.,Loongson Technology Corporation Ltd | Chen Y.,Chinese Academy of Sciences
IEEE Journal of Solid-State Circuits | Year: 2014

This paper is an extension of Hu et al., ISSCC, 2013, and it introduces the 32/28 nm implementations of Godson-3B1500, which are 8-core MIPS-compatible microprocessors with vector extensions. Godson-3B1500 is fabricated in STMicroelectronics 32/28 nm high-κ metal-gate low-power bulk CMOS with 10 metal layers. It contains 1.14 billion transistors and operates at the frequency of 1.0 GHz to 1.5 GHz with the voltage supply ranging from 1.0 V to 1.3 V. Compared to its predecessor (Hu et al., ISSCC, 2011), Godson-3B1500 brings significant power efficiency improvements with enhanced performance (150GFLOPS@1.2 GHz) and reduced power dissipation (< 40 W), due to not only technology scaling but also a great deal of design efforts. © 2013 IEEE.

Wu Y.,CAS Institute of Computing Technology | Wu Y.,University of Chinese Academy of Sciences | Lu C.,University of Chinese Academy of Sciences | Lu C.,Loongson Technology Corporation Ltd | Chen Y.,CAS Institute of Computing Technology
Frontiers of Computer Science | Year: 2016

With the rapid development of semiconductor industry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of packets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms. © 2016 Higher Education Press and Springer-Verlag Berlin Heidelberg

Zhu X.-J.,Chinese Academy of Sciences | Zhu X.-J.,Loongson Technology Corporation Ltd
Jisuanji Xuebao/Chinese Journal of Computers | Year: 2011

The development of integrated circuits makes the number of on-chip cores increase. Communication among the cores demands higher throughput, lower latency and more scalability. Traditional on-chip bus can not satisfy the need of on-chip communication. So researchers present a new interconnect architecture, called network on chip. In order to meet the special demand of network on chip, this paper gives a scalable topology named Rgrid and its routing algorithm called DR. Rgrid can reduce the average hops between on-chip cores, whose physical implementation is much easier than Torus topology. The author implements the Rgrid and Mesh tolopogies in the Godson3 simulator. The simulation results show that, simulator can gain much better performance using Rgrid topology than using Mesh topology for the Splash2 benchmarks. Compared to Mesh topology, the IPC of benchmarks of Rgrid increases by 0.5%~148%, the average latency degrades by 5%~81%.

Loading Loongson Technology Corporation Ltd collaborators
Loading Loongson Technology Corporation Ltd collaborators