A Scalable and Ultrafast Eigensolver for Three Dimensional Photonic Crystals on GPU
No Thumbnail Available
Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
無中文摘要
This research applies parallel computations on a GPU by CUDA for solving three dimensional Maxwell's equation with face-centered cubic (FCC) lattice. We focus on how to solve an Eigenvalue Problem more efficiently. Because of the problem we solved is Hermitian and positive definite. The algorithm of the solver is based on inverse Lanczos method for eigenvalue problems and associated conjugate gradient method for linear systems. By using cuBLAS, cuFFT, combining kernels, transpose multiple matrices simultaneously, and some skills, we can save time from computations or accessing memory. Integrating all techniques, we can solve each of a set of 5.184 million dimension eigenvalue problem for 10 smallest positive eigenvalues within 44 to 63 seconds. And we have a great scability on multiple GPU cards by MPI. All results are computed on two clusters. One is equipped two GPU cards called NVIDIA Tesla K40c, most of works are computed here. And the other is equipped a lot of GPU cards called M2070, which are used for MPI.
This research applies parallel computations on a GPU by CUDA for solving three dimensional Maxwell's equation with face-centered cubic (FCC) lattice. We focus on how to solve an Eigenvalue Problem more efficiently. Because of the problem we solved is Hermitian and positive definite. The algorithm of the solver is based on inverse Lanczos method for eigenvalue problems and associated conjugate gradient method for linear systems. By using cuBLAS, cuFFT, combining kernels, transpose multiple matrices simultaneously, and some skills, we can save time from computations or accessing memory. Integrating all techniques, we can solve each of a set of 5.184 million dimension eigenvalue problem for 10 smallest positive eigenvalues within 44 to 63 seconds. And we have a great scability on multiple GPU cards by MPI. All results are computed on two clusters. One is equipped two GPU cards called NVIDIA Tesla K40c, most of works are computed here. And the other is equipped a lot of GPU cards called M2070, which are used for MPI.
Description
Keywords
Maxwell equation, band structure, face-centered cubic lattice, GPU, CUDA, MPI, cuBLAS, cuFFT, Maxwell equation, band structure, face-centered cubic lattice, GPU, CUDA, MPI, cuBLAS, cuFFT