Zhang J.,Hangzhou Dianzi University |
Zhang J.,Key Laboratory of Complex Systems Modeling and Simulation |
Zhang J.,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis |
Wan J.,Hangzhou Dianzi University |
And 18 more authors.
Future Generation Computer Systems | Year: 2015
In this paper, we elaborate on improving the sparse matrix storage format to optimize the data locality of sparse matrix-vector multiplication (SpMVM) algorithm, and its parallel performance. First of all, we propose a cache oblivious extension quadtree storage structure (COEQT), in which the sparse matrix is recursively divided into sub-regions that can well fit into cache to improve the data locality. Later on, we present a COEQT based SpMVM algorithm and optimize its performance through manual vectorization. With this storage format, the original SpMVM is divided into computations of relatively independent small matrices. In addition, this region-based computation framework is also suitable for high performance computing in distributed computing environment. So, we finally present a parallel SpMVM algorithm based on the proposed COEQT. Extensive and comprehensive experiments show that the sparse matrix-vector multiplication using the COEQT storage format achieves on average 1.1-1.5. × speedup compared with CSR format and further higher performance through instruction level optimization techniques. The experiment in Lenovo Deepcomp 7000 demonstrates that this method achieves on average 1.63× speedup compared with the Intel Cluster Math Kernel Library implementation. © 2015 Elsevier B.V.