Three-dimensional (3D) optical imaging technology has been widely used in the field of biomedical imaging because of its significant advantages of noninvasive detection, high temporal resolution, and low cost[
Chinese Optics Letters, Volume. 14, Issue 7, 071701(2016)
GPU accelerated simplified harmonic spherical approximation equations for three-dimensional optical imaging
Simplified spherical harmonics approximation (SPN) equations are widely used in modeling light propagation in biological tissues. However, with the increase of order
Three-dimensional (3D) optical imaging technology has been widely used in the field of biomedical imaging because of its significant advantages of noninvasive detection, high temporal resolution, and low cost[
To reduce the computational burden of SPN equations, Li
In this Letter, to facilitate the application of the SPN equations to be more efficient, a GPU-accelerated framework is proposed for the SPN equations (referred hereafter as the GPU-based method). The accelerated framework is implemented by using a compute unified device architecture (CUDA)[
Sign up for Chinese Optics Letters TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
The accuracy of the GPU-based method was first evaluated by comparing it with the MC simulation. Then, the speed-up ratio between the GPU-based method and the conventional CPU computation (CPU-based method) was evaluated. The influence of the mesh size and mesh structure on the acceleration performance was also investigated. Finally, the performance of a parallel CG solver was evaluated with different thread organizations in the kernel function.
Based on the RTE, the SPN equations and the boundary conditions can be detailed as follows[
Incorporating the boundary conditions, the exiting partial current
From the above derivation of the finite element solver for SPN equations, the main procedure includes assembling the system matrix and solving the linear equations, which can be parallelized with the GPU technique.
The flowchart of the proposed GPU-based accelerated SPN equations is shown in Fig.
Figure 1.Flowchart of the GPU-accelerated framework for the SPN equations.
The major feature of the CG iterative solver is the fast calculation of the product and addition of vectors, which is heavily influenced by the memory access and data distribution on the GPU. The kernel function for calculating the product of the sparse matrix and vector is optimized by using the shared memory of each block in this study. To accelerate the process, each row of the sparse matrix is set in a warp to calculate the product and addition[
Figure 2.Kernel function of matrix-vector multiplication for the CSR sparse matrix format using 32 thread warp per matrix row.
We evaluated the performance of the proposed GPU-based method with the CPU-based method on a computer with an Intel Xenon 5440 processor of 2.4 GHz and an NVIDIA Tesla C2050 GPU. Both the CPU and GPU implementations made use of a double-precision floating-point format.
Firstly, the accuracy of the GPU-accelerated SPN equations was validated on two kinds of phantoms by comparing it with the MC simulation[
As shown in Figs.
Figure 3.Phantoms used in accuracy validation: (a) homogeneous cylindrical phantom. (b) Digital mouse model based phantom. (c) and (d) comparative results between the GPU-accelerated SPN equations (
The digital mouse model was used to demonstrate the capability of the proposed method in handling light propagation in the tissues with a complex structure. The organs included in the digital mouse were shown in Fig.
|
Secondly, the acceleration performance of the GPU-based method was investigated by comparing it with the CPU-based method. The phantom used in this investigation had the same size and optical properties as that used in the first phantom of accuracy validation. The influence of the size of system matrix on the acceleration performance was evaluated by discretizing the phantom into different numbers of tetrahedrons from 3421 to 94528. The ratio between the total time cost of the CPU-based SPN method and that of the GPU-based one was shown in Fig.
Figure 4.(a) Speed-up ratio of the total processing time using the GPU-accelerated SPN method over the CPU-based one. (b) Speed-up ratio of the GPU-accelerated CG solver over the CPU one for solving the system matrix of SP7 equations.
The acceleration performance of the GPU-accelerated CG solver with a different coopSize and blockSize in kernel function was also investigated. A cylindrical phantom with the same size and optical properties as that used in the above validation was adopted in this investigation, which consisted of 79626 tetrahedrons. The speed-up ratio of the GPU-accelerated CG solver over the CPU one for solving the system matrix of SP7 equations was shown in Fig.
During the experiments, there were some other findings about the GPU-accelerated CG solver. Although the CG solver for solving the linear problem is efficient and stable in most cases, it could become divergent for the case of a large scale system matrix, especially for high order SPN equations. The acceleration is highly influenced by the filling fraction or the sparsity of the system matrix for different order SPN equations. We find that the non-zero elements of the system matrix for the SP1 equation gather closer to the diagonal line compared with the SP3, SP5, and SP7 equations. The non-zero elements of the system matrix for the SP7 equations have the most decentralized distribution, and the size of the sparse system matrix for SP7 is 15 times larger than that of SP1. As a result, the CG solver may be divergent for the large scale matrix and high order SPN equations.
The kernel function of the CG solver is highly influenced by coopSize. The coopSize of 8 or 16 provides a better performance in this study. This may be attributed to the sparsity of the system matrix. Each row of the system matrix is processed in a warp on the GPU. However, when the number of non-zero elements in each row is less than warp size (32) or even smaller, the threads will be idle. So, we defined the coopSize to avoid thread idling when the non-zero elements were less than the warp size.
In conclusion, a GPU-based acceleration framework for SPN equations is proposed to study the light propagation of 3D optical imaging. The accuracy validation experiments demonstrate that the proposed GPU-accelerated method has a good agreement with the MC simulation. Furthermore, the acceleration performance investigation experiments illustrate that the proposed GPU-accelerated method has an excellent acceleration performance over the CPU-based method, with a best speed-up ratio of 25 for the observed cases. The performance of the proposed GPU-accelerated method proved that it is a powerful tool for 3D optical imaging.
[2] E. E. Graves, J. Ripoll, R. Weissleder, V. Ntziachristos. Med. Phys., 30, 901(2003).
[5] Q. Fang. Biomed. Opt. Express, 1, 165(2010).
[10] Z. Yuan, Q. Z. Zhang, E. Sobel, H. B. Jiang. J. Biomed. Opt., 14, 054013(2009).
[12] Y. J. Lu, B. H. Zhu, H. O. Shen, J. C. Rasmussen, G. Wang, E. M. Sevick-Muraca. Proc. SPIE, 7892, 78920F(2011).
[14] W. Li, H. J. Yi, Q. T. Zhang, D. F. Chen, J. M. Liang. Comput. Math. Methods Med., 2012, 394374(2012).
[15] N. Ren, J. Liang, X. Qu, J. Li, B. Lu, J. Tian. Opt. Express, 18, 6811(2010).
[16] M. Schweiger. J. Biomed. Imaging, 2011, 10(2011).
[17] C. Nvidia. Compute Unified Device Architecture Programming Guide(2007).
[20] N. Bell, M. Garland. Efficient sparse matrix-vector multiplication on CUDA(2008).
[21] S. Ren, X. Chen, H. Wang, X. Qu, G. Wang, J. Liang, J. Tian. Plos One, 8, e61304(2013).
Get Citation
Copy Citation Text
Shenghan Ren, Xueli Chen, Xu Cao, Shouping Zhu, Jimin Liang, "GPU accelerated simplified harmonic spherical approximation equations for three-dimensional optical imaging," Chin. Opt. Lett. 14, 071701 (2016)
Category: Medical optics and biotechnology
Received: Mar. 3, 2016
Accepted: Apr. 22, 2016
Published Online: Aug. 3, 2018
The Author Email: Jimin Liang (jimleung@mail.xidian.edu.cn)