High-fidelity CUDA-based 3DSIM parallel reconstruction method

Hongyu Wang; Ruijie Cao; Wenyi Wang; Yaning Li; Peng Xi

doi:10.3788/AI.2025.10002

1. Introduction

Structured illumination microscopy (SIM) is currently the most widely implemented super-resolution technique in the field of life sciences, as it offers fast imaging speed while maintaining low phototoxicity, both of which are crucial for live cell imaging^[1]. SIM can be broadly classified into two main types: 2D structured illumination microscopy (2DSIM)^[2] and 3D structured illumination microscopy (3DSIM)^[3–6]. By employing 2D pattern illumination and subsequent reconstruction algorithms, the lateral resolution is extended to twice the diffraction limit in 2DSIM. However, the illumination mode of 2DSIM is not axially modulated, which means that its axial resolution remains the same as that of wide-field microscopy. This can lead to significant errors in the reconstruction of thick samples. In contrast, 3DSIM incorporates axial modulated illumination in the axial direction, thereby doubling the resolution of the $z$ axis as well^[7]. Unlike traditional 2D SIM that lacks depth information, 3DSIM offers the depth information of the cell with high axial resolution ( $\sim 300 nm$ )^[8].

The reconstruction time^[9] and effect^[10,11] are the main indicators for measuring the quality of super-resolution reconstruction of structured light illumination. In terms of reconstruction time, based on fast reconstruction and real-time feedback, researchers can immediately adjust the experimental parameters (such as laser power, exposure time, illumination pattern, focus position, and fluorescent labeling conditions) according to the results. In summary, fast SIM reconstruction and real-time feedback provide us with the opportunity to comprehensively optimize multiple experimental parameters, which in turn improves the experimental efficiency and reduces unnecessary experimental repetitions and resource waste^[12]. In terms of reconstruction quality, based on high resolution and high-fidelity reconstruction results, researchers can observe the biological structure characteristics of cells more clearly, such as cell division, intracellular transportation, and dynamic interactions of organelles^[13]. The fast reconstruction ensures the timely visualization of critical moments of the dynamics, enabling the determination of event-triggering imaging processes^[14].

3DSIM reconstruction can be achieved on commercial systems such as Airy Polar-SIM, GE OMX, and Nikon N-SIM. At the same time, open-source software like AO-3DSIM^[15], CUDAsirecon^[16], SIMnoise^[17], 4BSIM^[18], and Open-3DSIM^[19] are used for reconstruction of the 3DSIM dataset. The CUDAsirecon method achieves a twofold increase in three-dimensional resolution through three-beam structured light illumination, laying the foundation for 3DSIM. But storage strategy optimization is not proposed with the limited graphics processing unit (GPU) memory.

In 3DSIM reconstruction, parameter estimation and parameter-based reconstruction are the two main steps. It is crucial to solve and separate the spectrum of the sample and then move it to the correct position^[20]. Therefore, it is essential to conduct a precise calculation of the parameters of the structured illumination pattern. Among these parameters, the illumination frequency vector primarily defines the spatial frequency characteristics of the illumination. While the frequency vector inherently contains some information related to direction, for the purpose of detailed analysis and accurate control in the imaging process, the angle parameter is explicitly separated. This angle determines the orientation of the structured illumination pattern. Additionally, the initial phase, which specifies the starting point of the periodic illumination pattern, and the modulation depth, which reflects the contrast level of the illumination intensity varying between its maximum and minimum values, are also crucial parameters. All these parameters are not only closely associated with the characteristics of the imaging hardware but also intricately related to the refractive index of the specimen, exerting significant influences on the imaging results. In order to accurately determine the parameters of the structured illumination pattern, many algorithm-based parameter estimation methods are proposed during the reconstruction process, such as the commonly used cross-correlation^[21,22]. The cross-correlation-based estimation method can solve for frequency vectors with sub-pixel accuracy. However, the iterative calculation of cross-correlation inevitably leads to long computation time^[23]. In comparison with cross-correlation, the non-iterative estimation methods, such as the Fourier domain phase of the peak (POP)^[24,25], the autocorrelation reconstruction (ACR)^[26], and the image recombination transform (IRT)^[27], have faster initial phase estimation but lower accuracy because they estimate the phase based on integer pixel frequency vectors^[28].

The complexity of these computations is the main reason for the time-consuming property of 3DSIM super-resolution reconstruction. The reconstruction algorithm has high parallelism and can make full use of the parallel computing capabilities of multicore processors or GPUs to increase the reconstruction speed. The data structure of variables and the rationality of the computational process are the main factors contributing to the consumption of computational resources. However, the parallelism design in current mainstream open-source algorithms is not reasonable. On the one hand, the variable space management is not optimal, and many variables do not release storage space after the calculation is completed. On the other hand, the memory allocation of variables cannot effectively meet the requirements. The memory space requirements of some variables change during the calculation process. Therefore, it is necessary to adjust the size of the variable space to minimize the overall space occupation. Optimization of spatial parallelism is not important in traditional algorithms with random access memory (RAM) as the main storage medium because the RAM space is relatively large. Even if this space is used up, high-speed hard disks can be used to create virtual memory. However, in the GPU storage space, the maximum space of a single-high performance mainstream graphics card is 24 GB. Therefore, it is necessary to make the algorithm execution process and variable data structure lightweight.

Therefore, we propose a Cu-3DSIM based on the compute unified device architecture (CUDA) acceleration method to achieve excellent and robust stack 3DSIM reconstruction. Through the maximized storage space optimization, the super-resolution reconstruction of the structured illumination at three angles within the limited GPU memory space can be completed. This parallel method encompasses the entire process of parameter estimation and super-resolution reconstruction. Moreover, to remove the reconstruction blur at a thick scattering specimen, we have further employed the Hilo process with the 1st-order frequency in place of the central wide-field component, so that optical sectioning can be achieved^[29]. We have compared the reconstruction performance and efficiency of different algorithms at multiple scales, including comparisons at different signal-to-noise ratio (SNR) levels and reconstruction speeds. We have demonstrated that, due to its parallel computing method, Cu-3DSIM provides outstanding performance on the reconstruction speed, while maintaining the same computation accuracy as central processing unit (CPU) based Open-3DSIM. On the basis of minimizing artifacts and preserving high-fidelity reconstruction of weak information, fast 3DSIM reconstruction has been achieved.

In our approach, a parallel method cross-correlation parameter estimation method is employed to estimate the frequency, angle, phase, and modulation depth of structured lighting patterns. Parallel computing can be utilized to optimize traditional iterative operations and obtain sub-pixel results in frequency vector computation. We make use of both the $+ 1$ st and $+ 2$ nd frequency components simultaneously. We take advantage of GPU parallel computing and estimate the frequency vector based on the $+ 2$ nd frequency peak and the $+ 1$ st spectrum simultaneously. If the result of the second frequency peak is reliable, we choose to use the result of the $+ 2$ nd frequency peak estimation first. Otherwise, if it is not reliable due to the low fringe contrast, we choose the result estimated by the $+ 1$ st spectral frequency vector. This approach can ensure the effectiveness of frequency vector estimation and improve the efficiency of parameter estimation. After obtaining the frequency vector estimation results, the corresponding parameters such as phase and angle can be analyzed. Generally, the peak value of the $+ 2$ nd frequency is more accurate than that of the $+ 1$ st frequency. However, in low SNR samples, the peak value of the $+ 1$ st frequency has a higher contrast than that of the $+ 2$ nd frequency. Therefore, this method can improve the accuracy of parameter estimation for low SNR images.

Clean background and clear reconstruction results are two main factors for measuring the quality of reconstruction. In general reconstruction results, the background information mainly comes from the zeroth frequency. This is reflected in the reconstruction result when the zeroth frequency is added to it. The background can be suppressed using a notch filter. However, the filter creates artifacts while removing the background. To solve the problem of background interference in the reconstruction results, the Hilo result with lower background is applied to represent the zeroth frequency. In this way, the background interference of the reconstruction results is weaker, and the reconstructed details are more prominent.

2. 3DSIM Parallel Reconstruction Method

2.1 System Parameter Preprocessing for Parallel Reconstruction

Classic 3DSIM super-resolution reconstruction is a time-consuming process. In order to improve the reconstruction speed, we utilize the CUDA parallel method to achieve the acceleration of super-resolution reconstruction. The raw image is read into the CPU memory first and then transferred to the GPU memory space. The super-resolution result can be obtained based on the proposed reconstruction algorithm with the parallel reconstruction method. Image reading and image storage are time-consuming processes as they involve extensive data interaction between the CPU, GPU, and hard disk. In the traditional SIM reconstruction algorithm with the CPU as the main computing platform, the time consumption of this process is not particularly obvious. However, in GPU-accelerated computing, the proportion of time spent on this process is relatively high. The GPU parallel method of the 3DSIM initialization and reconstruction process is shown in Fig. 1(a).

Figure 1.CUDA method of Cu-3DSIM. (a) Method of 3DSIM initialization and the reconstruction process. (b) Parallel GPU acceleration method of cross-correlation algorithms. (c) Separation matrices for the GPU parallel method.

Download full size

View all figures

The system parameters in Fig. 1(a) include fixed parameters of the imaging system, such as the optical transfer function (OTF), wavelength, and numerical aperture (NA). The estimated parameters need to be estimated through algorithms, such as phase and modulation depth. They are used to optimize the reconstructed image. The get order process refers to extracting information from different diffraction orders (e.g., zeroth, 1st order, 2nd order) in the data. The model OTF refers to the mathematical model used to describe the OTF of the system. This parameter is initially calculated based on the system parameters (e.g., wavelength, NA) as a theoretical model. To achieve more accurate reconstruction results, a more precise OTF can be obtained by incorporating imaging information from the actual optical path, such as measured or experimentally derived data. The shift process in Fig. 1(a) refers to the translation operation performed on data in the spatial domain. And Sum_fft refers to the summation operation performed on data that has undergone a fast Fourier transform (FFT), possibly used for image integration in the frequency domain.

The proposed method is based on the CUDA acceleration method to achieve excellent and robust stack 3DSIM reconstruction. Through the maximized storage space optimization, the super-resolution reconstruction of the structured light illumination at three angles within the limited GPU memory space can be completed. The GPU kernel structure is expressed in Sec. S1 of the Supplement 1. The structured light illumination principle of 3DSIM is provided in Sec. S2 of the Supplement 1.

2.2. Automatic Determination of Illumination Pattern Parameters with GPU Acceleration

In the process of estimating the parameters required for reconstruction, a parallel GPU acceleration method is adopted to calculate the cross-correlation algorithm, as shown in Fig. 1(b). We provide the principle of the cross-correlation algorithm in Sec. S3 of the Supplement 1. For parallel computing, the threads and blocks required are constructed based on the rows and columns in the sub-pixel image of the cross-correlation iterative operation. Each pixel in a row is regarded as a thread, and multiple blocks are formed between different rows. Each sub-pixel cross-correlation iterative operation constitutes an independent grid, and the parallel computation of each grid is controlled by the kernel in the host space. Based on this parallel method, a sub-pixel parameter can be estimated once during each parallel computation of the kernel function.

It can be seen from Fig. 1(b) that each grid can complete parameter estimation of an order of magnitude. Through repeated iterative operations, a set of accurate sub-pixel-level precision estimation parameters can be obtained. It is, thus, evident that, based on the GPU parallel computing method, the parameter estimation calculation can be rapidly completed through a single parallel computation. This method greatly improves the efficiency of parameter estimation, lays a good foundation for subsequent super-resolution reconstruction, and also reflects the advantages and value of the GPU parallel computing method in handling such complex computing tasks.

2.3. Parallel Method of 3DSIM Reconstruction

The parallel method of 3DSIM reconstruction separation matrices from three angles is confused to generate the $15 \times 15$ matrix shown in Fig. 1(c). In the 3DSIM reconstruction process, raw data are converted into frequency domain spatial data based on the Fourier transform in GPU space. Furthermore, based on the separation matrix, five frequency components can be acquired. This separation matrix in 3DSIM is a set of $5 \times 5$ matrices. Based on our data structure, data from three different angles illuminated by structured light can be computed in parallel. Thus, separation matrices from three angles are confused to generate a $15 \times 15$ matrix. Furthermore, the 15 slices of data consisting of three angle data can be calculated with the separation matrix to obtain frequency separation results: the 0, $\pm 1$ st, and $\pm 2$ nd spectral components.

The data shifting in the frequency domain space is the most challenging step for the GPU in all computations. The basic system calculation results need to be stored in the GPU memory space, such as the OTF matrix. In addition, the frequency separation results obtained in the previous step need to be stored, which also occupies a certain amount of space. On this basis, we also need to complete the movement of data between the five frequencies. The stack raw data need to be shifted between the five spectral components.

In Fig. 2, the variable ftorderims_up stores image data after the Fourier transform and sorting. The variable ftorderims is the image data after frequency domain separation. The variable ftshiftorderims represents image data that has undergone Fourier transform and shift operation. The variable originff is the shift operation result with phase correction. The variable anneuation is the shifting of the calculated notch filter. The variable ftshiftorderims is the multiplication result of anneuation and the above variable originfft.

Figure 2.Optimization of the reconstruction algorithm for the GPU method. (a) Traditional method of 3DSIM shifting and the notch filter process, and (b) the shifting and notch filter process in the proposed Cu-3DSIM method. (c) Calculation of the space occupation of the core variables in the Open-3DSIM reconstruction process, where the intermediate variables are not included in the counted variables. (d) Calculation of the space occupation of the core variables in the Cu-3DSIM reconstruction process, where the intermediate variables are not included in the counted variables.

Download full size

View all figures

The reconstruction principles of 3DSIM are provided in Sec. S4 of the Supplement 1. As shown in Fig. 2(a), super-resolution reconstruction is achieved after sequentially completing the order calculation (first color), spectral shifting (second color), and notch filtering (third color) in the Open-3DSIM algorithm. Each of these processes includes computations for three angles. In contrast, as shown in Fig. 2(b), the Cu-3DSIM algorithm processes the three angles sequentially. For each angle, the order calculation, spectral shifting, and notch filtering are computed in sequence. When the number of layers and the image size are small such that simultaneous computation for three angles does not exceed the GPU capacity, the sequential computation will be executed in parallel. This process increases the reconstruction speed. The input samples in Open-3DSIM at three different angles are calculated in the calculation process of “get order” in the traditional Open-3DSIM. The results in three directions are obtained accordingly and need to be stored. Similarly, in the calculation processes of “shift” and “north filter,” the calculations are carried out for the three angles separately. This leads to the generation of a large number of intermediate samples. In order to reduce the GPU space occupation, the proposed Cu-3DSIM method serializes the parallel calculations of the three angles. Therefore, the calculations for the three angles are completed serially in three steps (get order, shift, and north filter). Structurally, converting parallel calculations to serial ones will lead to an increase in the calculation time, but it saves a significant amount of GPU computing space. Therefore, compared with Open-3DSIM, the proposed Cu-3DSIM method has an advantage.

As shown in Figs. 2(c) and 2(d), the core variables space occupation of Open-3DSIM and Cu-3DSIM in the reconstruction process is counted, respectively. The intermediate variables are not included in the counted variables. Among them, bars of different colors represent different calculation processes. The color coding in Fig. 2(c) corresponds to the computational stages in Fig. 2(a), while the colors in Fig. 2(d) represent multi-angle processing as shown in Fig. 2(b). This process cannot be simply evaluated by time because the complexity of each process is different. Therefore, the horizontal axis of the coordinate system only represents the calculation process. In the actual Cu-3DSIM reconstruction, on this basis, 2 to 3 times more intermediate variables will be generated. In contrast, due to its iterative reconstruction pipeline and 3D data processing characteristics, Open-3DSIM may generate intermediate variables accumulating to 14 times the core variable size (as exemplified by the $1024 \times 1024 \times 19$ dataset analysis).

As mentioned before, the actual memory space required is three times that of the countable statistical variables. So, a single 24 GB GPU memory space at maximum cannot meet the calculation requirements of Open-3DSIM. As a result, Open-3DSIM needs to transfer the intermediate results from the GPU to the CPU space for temporary storage multiple times. On the one hand, the storage time of intermediate variables prolongs the calculation time of Open-3DSIM. In addition, the data transfer between the GPU and the CPU is also a factor that increases the calculation time of Open-3DSIM. Therefore, with Cu-3DSIM, a single GPU chip can independently complete the 3DSIM calculation for a set of 19 slices of $1024 \times 1024$ input samples, which is impossible with the traditional Open-3DSIM algorithm.

In the Cu-3DSIM reconstruction process, there is an obvious difference in the way of inputting the raw data compared with Open-3DSIM. In the Open-3DSIM method, the raw data are given to the reconstruction process in the form of a stack all at once, shown in Fig. 2(c). In contrast, in the Cu-3DSIM reconstruction method, the raw data are given to the reconstruction process slice-by-slice successively until all the slice data have been calculated. Therefore, the number of times this data-input process is executed cyclically is equal to the number of slices in the Z direction of the data, shown in Fig. 2(d).

It can be seen that, in the traditional computing method, the results of shifting in three directions need to be stored until the notch filter result for that direction is computed. In contrast, in the proposed computing method, only the final notch filter result needs to be retained. Compared with Open-3DSIM, Cu-3DSIM reduces the required computing resources by one-third. At the same time, the integrity of the reconstruction calculation remains unaltered. This reduction in computational space is of great significance as it allows for more efficient use of resources and potentially enables faster processing times. Moreover, the fact that the integrity of the reconstruction calculation is not affected ensures that the quality of the results is not compromised. This new computing method represents a significant advancement in the field of SIM reconstruction, offering a more efficient and effective approach to handling large amounts of data.

3. Experimental Results and Analysis

Fast super-resolution reconstruction is achieved based on the proposed reconstruction algorithm and parallel computing method. In order to obtain high-quality reconstructed raw images in Cu-3DSIM, it is necessary to design an illumination structured light that can generate high modulation contrast. The generation of 3D structured light requires the interference of three coherent light beams. The designed Cu-3DSIM imaging system is based on a digital micromirror device (DMD) and a coherent light source. In our experiment, the DMD is the V-650L product of ViALUX. The DMD is controlled by the DLP(R) LightCrafter(TM) DLPC 900 GUI control software. The camera used is a Hamamatsu ORCA-Flash 4.0 V2 sCMOS camera. The computer used in the experiment is a DELL Precision 3680, coreTM i9-14900 processor, with 64 GB CPU memory space, and an ATX3090 graphics card with 24 GB GPU memory space.

In the comparative experiment of reconstruction implementation, we reconstruct samples with different numbers of slices, respectively, and count the time and memory space required for super-resolution reconstruction by different reconstruction algorithms. The statistical results of reconstruction time and memory are shown in Fig. 3. The proposed method brings forth a multitude of advantages. Not only does it enhance the reconstruction speed, but it also reduces storage space while either maintaining or even surpassing the Open-3D reconstruction results. Currently, workstations typically support up to 128 GB of CPU memory space. Any storage requirements exceeding this limit necessitate a certain amount of virtual memory, which can potentially lead to slower processing speeds and other performance issues. In terms of reconstruction speed, when compared to the classic Open-3D, the time taken for Cu-3DSIM reconstruction can be improved by an order-of-magnitude. This significant improvement is a major breakthrough, enabling researchers to process data more quickly and efficiently. The specific statistical data, shown in Table 1, display the superiority of Cu-3DSIM in terms of reconstruction speed and storage space utilization.

Figure 3.Comparison of reconstruction time and memory allocation. (a) Reconstruction time of $512 \times 512$ image size samples. (b) Reconstruction time of $1024 \times 1024$ image size samples. (c) Reconstruction memory space of $1024 \times 1024$ image size samples.

Download full size

View all figures

Table 1. Parallel Method Performance.

View table

View all Tables

Table 1. Parallel Method Performance.


		Slice number
Category	Method	7	13	19
Time ( $512 \times 512$ )	Open-3DSIM	80.6	158.654	245.335
Cu-3DSIM	6.325	13.732	22.283
Time ( $1024 \times 1024$ )	Open-3DSIM	475.448	808.425	1199.447
Cu-3DSIM	39.206	75.424	109.478
Memory space ( $1024 \times 1024$ )	Open-3DSIM	58.6	118.5	237.9
Cu-3DSIM	4.7	9.4	19.8

As shown in Table 1, in the reconstruction experiment of 7 slices of a 512 pixel × 512 pixel image, the reconstruction time of Open-3DSIM is 80.6 s, while the reconstruction time of Cu is 6.325 s. Leveraging fast GPU parallel computing can greatly enhance reconstruction efficiency. This not only saves valuable time for researchers but also makes it possible to conduct more extensive and in-depth studies on the structures and functions of mitochondria, endoplasmic reticulum (ER), actin, and tubulin. With the ability to process large amounts of data quickly and accurately, Cu-3DSIM provides a powerful tool for advancing our understanding of cell biology.

During regular reconstruction with the CPU, virtual memory has to be turned on if the memory space exceeds 128 GB. If virtual memory is utilized during the reconstruction process, under the same conditions, the calculation speed will inevitably slow down. By compressing the overall reconstruction memory to within 20 GB, which is the storage space that a single high-performance graphics card can currently possess, significant benefits can be achieved. This approach not only reduces the reliance on virtual memory but also optimizes the reconstruction process. Moreover, if the computer is equipped with two or more graphics cards, the reconstruction speed can be further enhanced. This is because multiple graphics cards can work in parallel, distributing the computational load and, thereby, speeding up the reconstruction process, with the ability to handle larger datasets and complete reconstructions more quickly.

In terms of reconstruction quality, a leaf of black algae with an overly complex background is used for comparison between Open-3DSIM and Cu-3DSIM, as shown in Fig. 4. The enlarged area reveals that the hollow structure of the black algae leaf can be observed more vividly by Cu-3DSIM. It can be concluded that Cu-3DSIM possesses better optical sectioning ability, which is beneficial for distinguishing the hollow structure by effectively removing the background at the center of the hollow. This background suppression is achieved using a notch filter. However, it should be noted that, although the filter is very effective in removing the background, it may also generate artifacts. In other words, when Open-3DSIM uses a notch filter, it needs to choose a balance between the background and artifacts.

Figure 4.Complex background sample comparison experiment. (a) is the reconstruction result based on the Cu-3DSIM method, (b) is the reconstruction result based on the Open-3DSIM method, and (c) is the reconstruction result based on the Open-3DSIM with the notch filter method.

Download full size

View all figures

Subsequently, we conducted a comparison among OMX, Open-3DSIM, SIMnoise, and CUDAsirecon in the super-resolution reconstruction experiment of actin fibers, as shown in Fig. 5. Analyzing the reconstruction results of samples, it becomes evident that Cu-3DSIM exhibits superior performance compared to other algorithms. OMX and SIMnoise display high levels of artifacts in certain areas. The CUDAsirecon method achieves a twofold increase in three-dimensional resolution through three-beam structured light illumination, laying the foundation for 3DSIM. However, storage strategy optimization is not proposed with limited GPU memory. Meanwhile, the reconstruction resolution of the CUDAsirecon method, shown in Fig. 5(e), is not as high as that of the Cu-3DSIM method shown in Fig. 5(h). In contrast, compared to the classic Open-3DSIM, Cu-3DSIM incorporates the addition of Hilo to zero frequency. This unique approach enables the proposed method to obtain super-resolution reconstruction results with a clear background. By leveraging this innovative technique, Cu-3DSIM is able to achieve a more accurate and detailed visualization of actin fibers, providing researchers with a valuable tool for studying cellular structures and dynamics.

Figure 5.Cu-3DSIM performance compared with other methods. (a)–(e) are the reconstruction results based on different methods. (f), (g) are the quantitative evaluations of the reconstruction intensity and SNR. (h) is the reconstruction results of low SNR samples. (i) is the max intensity projections of the reconstruction results.

Download full size

View all figures

Super-resolution reconstruction of a low SNR and severely defocused background is indeed a typical and challenging problem in the field of imaging. In Open-3DSIM, two filters and a notch filter are employed to effectively suppress noise and background interference. Similarly, in the proposed method, two filters are also utilized to enhance the reconstruction quality at low SNR levels. The quantitative assessment of the SNR for these algorithms is presented in Fig. 5(f), which provides a clear indication of their performance. From both visual and quantitative perspectives, the reconstruction results of Cu-3DSIM demonstrate a relatively high SNR. It fully inherits the advantages of Open-3DSIM in low SNR reconstruction and even outperforms Open-3DSIM in terms of background interference suppression.

In addition, a set of mitochondrial experiments of a long-term reconstruction is presented in Fig. 6. We compared 7 slices of long-term imaging reconstruction of mitochondria using different methods. This set of experiments consists of 7 frames, with each frame comprising 7 slices. In the experiment, it can be seen that, for long-term imaging, Cu-3DSIM has the characteristics of stable and accurate reconstruction. From the local features, it can be seen that, compared with Open-3DSIM, the cristae of mitochondria can be observed based on the proposed method. By comparing the $x o z$ and $y o z$ images, the reconstruction results based on Cu-3DSIM have better resolution and background performance than Open-3DSIM-based results. In terms of computation time, Open-3DSIM takes a total of 739.2 s (with an average of 105.6 s per frame). However, by employing the proposed cu-SIM, the reconstruction time is significantly compressed to 58.2 s (an average of 8.3 s per frame). Leveraging fast GPU parallel computing, the reconstruction efficiency can be greatly enhanced. This not only saves valuable time for researchers but also enables more extensive and in-depth studies of mitochondrial structures and functions. With the ability to process large amounts of data quickly and accurately, Cu-SIM offers a powerful tool for advancing our understanding of cellular biology.

Figure 6.Long-term reconstruction experiment. (a) is the reconstruction results based on the Cu-3DSIM method, and (b) is the reconstruction results based on the Open-3DSIM method.

Download full size

View all figures

4. Conclusion

In this paper, a novel SIM reconstruction method is proposed with GPU parallel acceleration computing. An optimization algorithm for suppressing the background is proposed, which can obtain reconstruction results with a clear background. A parallel computing method is proposed, which can swiftly complete the parameter estimation, frequency separation, frequency shift, and filter required for super-resolution reconstruction. The reconstruction speed can be improved by an order of magnitude while maintaining the reconstruction accuracy. Hilo information is added into the reconstruction results to replace the zeroth frequency information, suppressing the background while reducing artifacts. Through our experiments, different samples were tested for benchmarking the performance. Cu-3DSIM can obtain reconstruction results faster, making it possible to rapidly iterate and observe the cell state within the limited cell cycle by taking the advantage of the parallel and accelerated GPU computation method.

Acknowledgments

Acknowledgment. This work was supported by the National Natural Science Foundation of China (Nos. 62206183, 62025501, 92150301, and 624B2009) and the National Key R&D Program of China (No. 2022YFC3401100). General data are provided to reviewers and uploaded to figshare in Ref. [30]. Software for Cu-3DSIM is uploaded to GitHub in Ref. [31]. It includes software versions, runtime environment configurations, and basic examples.

Category: Research Article

Received: Mar. 14, 2025

Accepted: Jun. 4, 2025

Published Online: Jul. 11, 2025

The Author Email: Peng Xi (xipeng@pku.edu.cn)

DOI:10.3788/AI.2025.10002