Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis

Hangzhou, China

Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis

Hangzhou, China
SEARCH FILTERS
Time filter
Source Type

Zhang J.,Hangzhou Dianzi University | Zhang J.,Key Laboratory of Complex Systems Modeling and Simulation | Zhang J.,Zhejiang University | Zhang J.,Zhejiang University of Science and Technology | And 15 more authors.
Mobile Information Systems | Year: 2017

With the development of the mobile systems, we gain a lot of benefits and convenience by leveraging mobile devices; at the same time, the information gathered by smartphones, such as location and environment, is also valuable for business to provide more intelligent services for customers. More and more machine learning methods have been used in the field of mobile information systems to study user behavior and classify usage patterns, especially convolutional neural network. With the increasing of model training parameters and data scale, the traditional single machine training method cannot meet the requirements of time complexity in practical application scenarios. The current training framework often uses simple data parallel or model parallel method to speed up the training process, which is why heterogeneous computing resources have not been fully utilized. To solve these problems, our paper proposes a delay synchronization convolutional neural network parallel strategy, which leverages the heterogeneous system. The strategy is based on both synchronous parallel and asynchronous parallel approaches; the model training process can reduce the dependence on the heterogeneous architecture in the premise of ensuring the model convergence, so the convolution neural network framework is more adaptive to different heterogeneous system environments. The experimental results show that the proposed delay synchronization strategy can achieve at least three times the speedup compared to the traditional data parallelism. © 2017 Jilin Zhang et al.


Zhang J.,Hangzhou Dianzi University | Zhang J.,Key Laboratory of Complex Systems Modeling and Simulation | Zhang J.,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis | Wan J.,Hangzhou Dianzi University | And 18 more authors.
Future Generation Computer Systems | Year: 2015

In this paper, we elaborate on improving the sparse matrix storage format to optimize the data locality of sparse matrix-vector multiplication (SpMVM) algorithm, and its parallel performance. First of all, we propose a cache oblivious extension quadtree storage structure (COEQT), in which the sparse matrix is recursively divided into sub-regions that can well fit into cache to improve the data locality. Later on, we present a COEQT based SpMVM algorithm and optimize its performance through manual vectorization. With this storage format, the original SpMVM is divided into computations of relatively independent small matrices. In addition, this region-based computation framework is also suitable for high performance computing in distributed computing environment. So, we finally present a parallel SpMVM algorithm based on the proposed COEQT. Extensive and comprehensive experiments show that the sparse matrix-vector multiplication using the COEQT storage format achieves on average 1.1-1.5. × speedup compared with CSR format and further higher performance through instruction level optimization techniques. The experiment in Lenovo Deepcomp 7000 demonstrates that this method achieves on average 1.63× speedup compared with the Intel Cluster Math Kernel Library implementation. © 2015 Elsevier B.V.

Loading Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis collaborators
Loading Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis collaborators