A1 Journal article (refereed)
On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method (2018)
Myllykoski, M., Rossi, T., & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, 115(May), 56-66. https://doi.org/10.1016/j.jpdc.2018.01.004
JYU authors or editors
Publication details
All authors or editors: Myllykoski, Mirko; Rossi, Tuomo; Toivanen, Jari
Journal or series: Journal of Parallel and Distributed Computing
ISSN: 0743-7315
eISSN: 1096-0848
Publication year: 2018
Volume: 115
Issue number: May
Pages range: 56-66
Publisher: Academic Press
Place of Publication: Maryland Heights
Publication country: United States
Publication language: English
DOI: https://doi.org/10.1016/j.jpdc.2018.01.004
Publication open access: Not open
Publication channel open access:
Publication is parallel published (JYX): https://jyx.jyu.fi/handle/123456789/57129
Abstract
Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.
Keywords: information technology; linear models; reduction (active)
Free keywords: reduction; Fast direct solver; GPU computing; partial solution technique; PSCR method; Roofline model; Separable block tridiagonal linear system
Contributing organizations
Related projects
- Efficient Numerical Methods for Acoustic Wave Propagation
- Toivanen, Jari
- Research Council of Finland
Ministry reporting: Yes
VIRTA submission year: 2018
JUFO rating: 2