Enhancing deep-learning-based structured illumination microscopy reconstruction with light field awareness

Longkun Shan; Zehao Wang; Tongtian Weng; Xiangdong Chen; Fangwen Sun

doi:10.3788/COL202523.101101

1. Introduction

The Abbe diffraction limit^[1] confines the spatial resolution of fluorescence microscopy to roughly half the wavelength of light, posing a significant challenge in visualizing subcellular structures in detail. To overcome this constraint, super-resolution microscopy techniques have been developed over the past decades. Noteworthy examples include stochastic optical reconstruction microscopy^[2,3], photoactivated localization microscopy^[4,5], and stimulated emission depletion microscopy^[6,7]. Among these innovations, structured illumination microscopy (SIM^[8]) stands out for its rapid imaging capabilities, broad field of view, compatibility with a variety of fluorescent probes, and the potential to achieve resolutions twice that allowed by diffraction.

Reconstruction algorithms are essential for SIM^[9] as they extract high-frequency details from raw images captured under structured light illumination. Conventional SIM reconstruction typically employs Fourier-transform-based methods that require accurate knowledge of the light field’s physical parameters. However, practical imperfections such as distortion, scattering, and noise can introduce subtle discrepancies, resulting in pronounced artifacts^[10,11].

Deep learning (DL) frameworks have become a powerful tool in SIM, providing super-resolved image reconstruction, enhanced resolution, and reduced artifacts stemming from inaccurate parameter estimation^[9,12–14]. Recent advancements in DL-SIM highlight its versatility and performance improvements across multiple dimensions. Universal reconstruction models based on transfer learning have been developed, enabling cross-system compatibility and achieving real-time imaging speeds of 200 ms per frame^[12]. The incorporation of prior knowledge of illumination patterns into rationalized deep learning (rDL) for SIM has enabled sustained live-cell imaging with minimized phototoxicity, significantly enhancing the ability to observe rapid subcellular dynamics^[14]. Additionally, neural networks have been shown to reduce the required number of raw images by five-fold compared to traditional SIM, while maintaining exceptional photon efficiency for low-light conditions ( $100 \times$ fewer photons)^[15]. However, DL-SIM encounters a significant hurdle known as the out-of-distribution problem^[16]. When light fields or sample structures significantly deviate from the training set, DL-SIM models may produce artifacts or fail to maintain consistent reconstruction quality. This issue arises when a model trained under specific conditions produces artifacts in different scenarios, thereby impeding robust generalization in microscopic imaging^[11].

In this work, we propose a novel approach called awareness-of-light-field SIM (AL-SIM) to mitigate the out-of-distribution problem in DL-SIM reconstructions. As illustrated in Fig. 1, AL-SIM uses a two-stage algorithm. First, a deep learning model predicts the perturbed illumination light field from the raw SIM data. Next, rather than relying on the standard cosine illumination, we use the predicted light field to generate simulated data for model training, thereby reducing data bias. We validate AL-SIM through comparative experiments on simulated light field distortions and real-world BSC1 cell imaging, demonstrating that it achieves resolution comparable to conventional DL-SIM while significantly reducing artifacts and improving the normalized root mean square error (NRMSE).

Figure 1.AL-SIM pipeline. (a) The first training stage, (b) the first test stage, (c) the second training stage, and (d) the second test stage.

Download full size

View all figures

2. Methods

The AL-SIM method employs a two-stage approach. In the first stage, the model is trained to accurately estimate the light fields from the raw SIM data, aiming to capture a distribution closer to the actual physical light field. In the second stage, the predicted light fields from the first stage are used to generate simulated data that more closely reflects the real distribution. This data is then employed to train a reconstruction model, which is subsequently used to produce the super-resolution image from the raw SIM data.

2.1. AL-SIM pipeline

Figure 1 provides an overview of the AL-SIM pipeline. Figures 1(a) and 1(b) illustrate the first stage, in which a network is optimized to predict light fields. Specifically, simulated cosine light fields with distortions are multiplied by emitters, convolved with the simulated point spread function (PSF), and then combined with simulated noise and background to form a synthetic SIM dataset. As shown in Fig. 2, the network adopts a U-Net architecture^[17]; its encoder processes the raw SIM data, and the two decoders subsequently reconstruct the features to predict both the light fields and the emitters. The loss function is computed by comparing the predicted outputs with the ground truth, thereby guiding the network to accurately estimate light fields from the raw SIM data. These predicted light fields are then used to generate bias-corrected simulated data for the second training stage. Figures 1(c) and 1(d) illustrate the second stage, where the light fields predicted from real SIM data are employed to create simulated SIM data. Here, the model is trained using a loss function based on the discrepancy between the predicted emitters and the ground truth. Finally, the trained model is applied to real SIM data, producing bias-corrected super-resolution images.

Figure 2.AL-SIM network architecture. The network is based on a U-Net structure with two parallel branches. The middle purple section represents the encoder pathway that processes the raw SIM data through progressive downsampling (via 2 × 2 max pooling operations shown in red arrows), reducing spatial dimensions while increasing feature channels. The yellow sections above and below function as decoders that predict the light fields and emitter distributions, respectively, through systematic upsampling (via bilinear interpolation shown in green arrows), which restores spatial resolution. The architecture features skip connections (gray arrows) that connect the encoder to both decoder pathways, allowing high-resolution features to bypass the bottleneck and preserve spatial information.

Download full size

View all figures

2.2. Data synthesis methodology

Our simulated training dataset comprises emitters, light fields, PSFs, noise, and background. For emitters, scatter points, curves, and natural-image-based patterns have proven effective and are already applied in DL-SIM^[18,19]. Following this approach, we generate random scatter points and curves while using COCO data^[20] for natural images. The random curve generation algorithm proceeds as follows: 1) randomly add control points within the field of view, with each curve containing 100–200 points for diversity; 2) interpolate a smooth curve through these points using Catmull-Rom splines^[21], parameterized from 0 to 1; 3) apply random brightness variations along the curve, starting at 100 and incremented by values in the range of $- 2$ to 2, to simulate natural fluctuations.

For the light fields, the simulated cosine pattern adheres to the parameters of our two-beam interference SIM, with the period varying between 327 and 546 nm. Additionally, nine simulated light fields incorporate distortions and aberrations to mirror the imperfections observed in real SIM data. First, we define a set of nonlinear functions (such as sine, cosine, and tangent). Then, for each channel of each light field image, we randomly select three of these functions and generate three coefficients that sum to 1. We combine these functions and coefficients into a composite function, which is subsequently applied to distort the nine light field images. This approach yields light field images with substantial diversity and complexity.

To simulate realistic imaging conditions, we introduce Gaussian noise $n (x, y) \sim N (0, σ^{2})$ to the acquired images. With a relative $σ$ range from 0 to 0.1, representing different noise levels that might be encountered in practical SIM imaging scenarios. The PSF closely matches parameters from our experimental setup: an objective lens with a numerical aperture of 1.30, a fluorescence wavelength of 600 nm, and a pixel size of 40 nm. A Gaussian filter with $σ = 16 pixel$ is used to emulate smooth background variations commonly seen in microscopic imaging. Finally, emitters, light fields, PSFs, background, and noise are integrated to produce the simulated raw SIM data.

2.3. Implementation details

Our model was trained using eight NVIDIA A100 80 GB GPUs with a batch size of 32 per GPU (total effective batch size of 256). The training process consisted of two stages: the first stage took approximately 8 h to complete, while the second stage required approximately 1 h. The model was trained for a total of 200 epochs with a learning rate of 0.0003. These computing resources and hyperparameters were carefully selected to ensure optimal model performance while maintaining reasonable training times.

2.4. Loss function

To train AL-SIM, we introduce a composite loss function comprising two terms: one measuring the discrepancy between the predicted light fields and their ground truth, and the other measuring the discrepancy between the predicted emitters and their ground truth. Both terms are computed using the same function $\hat{L}$ , defined as $\hat{L} = α \cdot L_{ms-ssim} + (1 - α) \cdot L_{L 1},$ leading to the overall loss function $Loss = β \cdot {\hat{L}}_{emitters} + (1 - β) \cdot {\hat{L}}_{light fields},$ where $L_{ms-ssim}$ and $L_{L 1}$ represent the multi-scale structural similarity index^[22] and the sum of absolute differences (L1), respectively. In our setup, $α = \frac{1}{8}$ and $β = \frac{1}{2}$ .

Using both $L_{ms-ssim}$ and $L_{L 1}$ for training offers multiple benefits. The $L_{ms-ssim}$ term captures perceptual and structural information aligned with human vision, while the $L_{L 1}$ term enforces pixel-wise accuracy by minimizing the sum of absolute differences. This combination has proven particularly effective for image restoration^[22].

2.5. Conventional SIM algorithm

The conventional SIM algorithm operates by separating high- and low-frequency components in the raw data, which is collected under different phases of the illuminating light field. In the Fourier domain, high-frequency information is shifted to the zero-frequency position and combined with low-frequency components, thereby expanding the overall frequency spectrum. Sinusoidal illumination in multiple directions further broadens the spectrum along different orientations, enhancing spatial resolution. In our experiment, we use fairSIM^[23] as the traditional SIM reconstruction algorithm.

2.6. DL-SIM

In our implementation of DL-SIM, we employ the widely adopted U-Net architecture for supervised super-resolution tasks to ensure generalizability. The network comprises an encoder-decoder structure with skip connections, where both the encoder and decoder consist of four hierarchical levels. At each level, the encoder applies two convolutional layers followed by downsampling, while the decoder performs upsampling followed by two convolutional layers. The number of filters in the convolutional layers increases progressively from 32 to 64, 128, and finally 256, enabling the network to extract and refine features at different spatial scales. The model takes nine-channel SIM raw images as input and output single-channel emitter predictions. For fairness in comparison, the second-stage emitter prediction model in AL-SIM utilizes the identical U-Net architecture as DL-SIM.

2.7. Double-beam interference super-resolution structured light microscope

As illustrated in Fig. 3(a), a 532 nm laser beam is expanded by a 5× beam expander to uniformly illuminate the active region of a ferroelectric liquid crystal on silicon spatial light modulator (FLCOS-SLM; $2048 pixel \times 1536 pixel$ , QXGA-3DM, Forth Dimension Displays). A polarizing beam splitter (PBS) and a half-wave plate (HWP) are positioned between the laser and the FLCOS-SLM to prepare a precise linear polarization state suitable for the SLM. For the subsequent generation of structured illumination, S-polarization of the interfering beams is key to maximizing fringe contrast, as it ensures optimal electric field alignment for both constructive and destructive interference. This required S-polarization is efficiently achieved using a zero-order vortex half-wave plate (vortex HWP), typically placed after the SLM where it interacts with the diffracted beams. The vortex HWP’s spatially varying fast-axis converts these incident linearly polarized beams into the required co-polarized S-polarized state based on their position, ensuring high-contrast performance with good energy efficiency.

Figure 3.Experimental setup. (a) Optical path of SIM. (b) Formation of different illumination light fields and phase modulation.

Download full size

View all figures

As shown in Fig. 3(b), a binary grating pattern is loaded onto the FLCOS-SLM. The FLCOS-SLM, PBS, and HWP together form a polarization grating that concentrates the laser energy onto the $\pm 1$ diffraction orders. Other orders are blocked by a mask, and the zero-order vortex HWP rotates the polarization direction of the $\pm 1$ beams to generate S-polarization, thereby maximizing the contrast of the structured illumination. After traveling through a $4 f$ system, the $\pm 1$ diffracted beams interfere on the sample surface to form structured light.

Fluorescence emitted by the sample is split off via a dichroic mirror and recorded by an sCMOS camera (Hamamatsu, C13440-20CU) through a long-pass filter. Nine grating patterns with varying directions and phases [Fig. 3(b)] generate distinct structured light fields. The super-resolution fluorescence image is then reconstructed from nine raw images acquired under these different illumination patterns.

2.8. Cell culture

African green monkey kidney epithelial cells (BSC1) are cultured in DMEM supplemented with 10% fetal bovine serum (FBS). The cells are maintained under standard conditions (5% ${CO}_{2}$ , humidified atmosphere at 37°C) and seeded onto No. 1.5 glass-bottom dishes 24 h before sample preparation. For cell passage, the cells are washed three times with pre-warmed PBS and then digested with 25% trypsin.

Cells are grown on 35 mm No. 1.5 glass coverslips in glass-bottom dishes. On the day of sample preparation, they reach 50%–70% confluence and are fixed for 10 min in pre-warmed (37°C) fixation buffer containing 4% EM-grade paraformaldehyde and 0.1% glutaraldehyde in PBS. Following fixation, the samples are washed three times with 2 mL PBS and incubated for 30 min at room temperature in PBS containing 5% BSA and 0.5% Triton X-100. Actin stain 555 (Cytoskeleton, PHDH1-A), diluted in the same BSA-Triton solution, is applied, and the cells are incubated for more than 12 h at 4°C in the dark. After incubation, the samples are washed three times with 2 mL PBS for 5 min each. A post-fixation buffer is then applied for 10 min, followed by five washes in sterile water. The remaining sterile water is removed, and the samples are air-dried, sealed with parafilm, and stored at $- 20 ° C$ .

3. Results

We perform simulation experiments to verify AL-SIM by simulating SIM imaging of filament structures under light field distortions. To test its generalisation, we introduce a distortion distribution different from the one used during training. Figure 4(a) presents a representative example from the test set, illustrating the distorted light field, the underlying filament structures, and the raw images. We generate 100 test images and train and evaluate both DL-SIM and AL-SIM. Figure 4(b) displays one of the inferred light fields, which will subsequently be utilized for stage 2 training of AL-SIM, to develop an emitter prediction model. Finally, DL-SIM achieves an average NRMSE of 0.068, while AL-SIM achieves 0.063. As shown in Fig. 4(c), AL-SIM significantly reduces artifacts compared to DL-SIM. To further validate the generalizability of our approach, we implemented our two-stage training method with the RCAN^[24] architecture on the same simulation dataset. The results show significant improvement, with NRMSE decreasing from 0.096 to 0.085, representing an 11.5% reduction in reconstruction error.

Figure 4.Simulation validation of AL-SIM under light field distortions. (a) Distorted illumination patterns (left), simulated filament structures (middle), and raw SIM images (right). (b) Illumination patterns predicted by the AL-SIM first-stage model. (c) Comparative reconstruction results: (i) AL-SIM, (ii) DL-SIM, (iii) ground truth, and (iv) wide-field microscopy. Scale bars: 1 µm (main panels); 200 nm (magnified insets).

Download full size

View all figures

In the real-world experiment, conventional SIM significantly improves the resolution of microfilament structures in BSC1 cells. However, as shown in Figs. 5(c) and 5(d), artifacts affect the accurate localization of other structures located near areas of high fluorescence intensity. These artifacts arise from several factors, including inaccuracies in reconstruction parameters, movement of fluorescent molecules in living cells, refractive index mismatch, and scattering in deeper biological tissues.

Figure 5.Comparative experimental results. (a) Comparison between AL-SIM and wide-field microscopy. (b) Estimated light field patterns (Est LF) derived from experimental raw data [Raw (Exp)] by AL-SIM, which are then combined with simulated emitter data to generate synthetic training samples [Raw (Simulation)] for the second-stage network. (c) Imaging of region 1 using AL-SIM, DL-SIM, conventional SIM, and wide-field microscopy. (d) Spatial resolution comparison along the white dashed line, highlighting that AL-SIM outperforms DL-SIM and conventional SIM in terms of artifacts. (e) Logarithmic Fourier spectra of imaging using AL-SIM, DL-SIM, conventional SIM, and wide-field microscopy. (f) Imaging of region 1 using different techniques and the resolution improvement along the white dashed line. (g) Imaging of region 2 using different techniques and the resolution improvement along the white dashed line [(a) scale bar: 10 µm; (b)–(d) scale bar: 2 µm].

Download full size

View all figures

The DL-SIM reconstruction method is trained on a large set of images produced by simulating the SIM imaging process, displaying robust generalization across various simulated sample types. However, when applied to real data, the training set does not account for the illumination field distortions encountered in practice, creating a mismatch between training and test conditions. Consequently, artifacts appear in several regions due to these unmodeled distortions in the illumination patterns, as shown in Figs. 5(c) and 5(d).

The AL-SIM method estimates the light field and uses it to generate a simulated training set. As illustrated in Fig. 5(b), our approach first derives estimated light field patterns (Est LF) from experimental raw data [Raw (Exp)]. These estimated patterns capture the actual illumination conditions present during the experiment, including any distortions. The extracted light field patterns are then combined with simulated emitter distributions to generate synthetic training samples [Raw (Simulation)] that closely match the characteristics of real experimental data. This synthetic dataset serves as the training input for the second-stage network, aligns the training and test data, and thus reduces artifacts [Fig. 5(d)]. Comparing results from different methods reveals that in region 1, where fluorescent molecules are sparsely distributed, artifacts are significantly reduced. As shown in Fig. 5(f), these artifacts appear as side lobes; AL-SIM yields minimal side lobes. In region 2, densely packed fluorescent molecules result in superimposed artifacts from different molecules [Fig. 5(d)]. Even under these high-density conditions, AL-SIM exhibits notable artifact suppression.

Artifacts are often visible in the Fourier spectrum, manifesting as anomalous signals in the low-frequency region^[25,26]. Specifically, these artifact components appear as residuals in the Fourier domain^[26], as shown in Fig. 5(e). Side-by-side comparisons in Figs. 5(c)–5(e) show that AL-SIM produces fewer artifact components than both DL-SIM and conventional SIM.

We also perform resolution calibration, as shown in Fig. 5(f). Using the full width at half-maximum (FWHM) method, we measure resolutions of 342 nm for wide-field, 250 nm for conventional SIM, 171 nm for DL-SIM, and 167 nm for AL-SIM. Additionally, using decorrelation analysis^[27], we obtain values of 272.82 nm (wide-field), 308.31 nm (conventional SIM), 158.60 nm (DL-SIM), and 84.15 nm (AL-SIM). Notably, conventional SIM reconstruction fails to correctly calculate the decorrelation analysis resolution due to the influence of artifacts.

4. Discussion

Compared to existing DL-based approaches, the AL-SIM method improves consistency between training and experimental data sets and reduces artifacts caused by out-of-distribution samples. It is applicable not only to cosine illumination light fields formed by double-beam interference but also to more complex imaging techniques. However, it should be noted that the method may have difficulty detecting light fields in images with low contrast, low signal-to-noise ratio, and sparse distribution of fluorescent molecules (e.g., fluorescent beads). We plan to extend this work by compensating for light fields in failed regions.

Additionally, our two-stage training method can significantly enhance other supervised SIM models. Its successful application to diverse architectures, exemplified by notable improvements in the RCAN model, confirms its broad utility in boosting various deep learning frameworks for SIM reconstruction.

5. Conclusion

In this work, we introduce AL-SIM, a novel approach to address the out-of-distribution problem in DL-SIM reconstructions. By accurately estimating the light field, AL-SIM effectively corrects errors due to data distribution variations, achieving twice the resolution of the diffraction limit and significantly reducing artifacts. Our method improves the consistency between training and experimental data and enhances the applicability of SIM in complex biological environments.

Category: Imaging Systems and Image Processing

Received: Feb. 25, 2025

Accepted: May. 27, 2025

Published Online: Sep. 9, 2025

The Author Email: Zehao Wang (zehao@ustc.edu.cn)

DOI:10.3788/COL202523.101101

CSTR:32184.14.COL202523.101101