Fast structured illumination microscopy via deep learning

Chang Ling; Chonglei Zhang; Mingqun Wang; Fanfei Meng; Luping Du; Xiaocong Yuan

doi:10.1364/PRJ.396122

1. INTRODUCTION

Fluorescence microscopy is an important tool in the life sciences for observing cells, tissues, and organisms. However, the Abbe diffraction limit [1] implies that the spatial resolution of the fluorescence microscope can attain only half the wavelength of incident light. Recently developed techniques in microscopy, such as stochastic optical reconstruction microscopy (STORM) [2,3], photoactivated localization microscopy (PALM) [4,5], structured illumination microscopy (SIM) [6,7], stimulated emission depletion (STED) [8,9], and other super-resolution microscopy [10–12] can help overcome this limit to enable the imaging of biological processes in cells at higher resolution.

Owing to its low phototoxicity and high frame rate acquisition, SIM stands out among these techniques to achieve optical super-resolution in bio-imaging [13]. In general, SIM enhances resolution by encoding high spatial frequencies of the sample in structured patterns (typically sinusoidal to affect the formation of the Moiré pattern). By measuring the frequency of the Moiré pattern in the observed image and the known frequency of the pattern of illumination, the unknown frequency content of the specimen can be computed. In linear SIM, it is theoretically up to twice the frequency limit, which is imposed by the optical transfer function (OTF) of the optical system. In nonlinear SIM [14], by the use of the nonlinear effect of fluorescence, it could reach more times the frequency limit.

To compute unknown frequencies from raw data, SIM requires three images with shifting illumination patterns to separate mixed spatial frequencies along a given orientation. To enhance isotropic resolution, this process is performed three times with illumination patterns obtained at different angles and requires a total of nine raw images per super-resolved (SR) SIM image, which means that the sample needs to be repeatedly exposed. Thus, reducing number of raw images in SIM reconstruction has been researched in recent years. SR image reconstruction using three [15 - 17] and four [18] raw frames of structured illumination (SI) has been implemented to increase the speed of acquisition of the images and reduce phototoxic effects. But these methods require assumptions about the process of formation of the image, and the final results are limited by the imaging environment and type of noise. For example, in the deconvolution method [19], this requires a precise understanding of the optics and well-characterized noise-related statistics. This has led to the design of such popular algorithms as the joint Richardson-Lucy deconvolution [18, 20], which requires knowledge of the point-spread function of the microscope and assumes Poisson noise statistics to estimate missing information in SIM. However, such algorithms are limited by the accuracy of their assumptions and thus cannot capture the full statistical complexity of microscopic images.

Machine learning [21] has been used more commonly in recent years with advances in computational performance. The core concept of machine learning is to find a rule to realize a correlation between the input and the output. This process is carried out using a large amount of tagged data. Deep learning (DL) [22] is a method of machine learning in which the “deep” refers to the depth of the model, which emphasizes learning from successive layers and looking for increasingly meaningful representations.

The DL framework does not explicitly use any model or prior knowledge, and instead relies on large datasets to “learn” the underlying inverse problem. The convolutional neural networks (CNNs) [23] are in a category of deep learning that can obtain excellent results in problems of image processing and computer vision tasks. Its outcome has two important components. First, the result of the training stage is a CNN that corresponds to a plausible underlying mapping function relating the measurement to the solution. Second, the trained CNN can be used to make “predictions” when presented with new measurements that were not used in the training stage.

In recent years, deep learning methods have been applied to super-resolution microscopic imaging, such as the regular optical microscopes [24], PALM [25], STORM [26], and Fourier ptychographic microscopy [27], and they have achieved good results.

The paper proposes the use of a deep-learning-based framework to reconstruct SIM images using fewer frames than are currently required. The cycle-consistent generative adversarial network (CycleGAN) is used to reconstruct the super-resolution image (we called it 3_SIM) through the single-direction phase shift of three raw SI images (we called them 1d_SIM). Owing to the characteristics of the CycleGAN, the data in train A and train B do not need to correspond one to one. The network can be trained without using paired training data, which reduces the number of training steps needed and saves time. Our method does not require assumptions about the modeling of the process of image formation, and instead creates a super-resolved image directly from the raw data. It requires only three SI images in a given direction and reconstructs a 1d_SIM image, and it can generate a 3_SIM image with a reconstruction resolution comparable to the traditional linear SIM methods. This method is parameter free, requires no expertise on the part of the user, is easy to implement on any SIM dataset, and does not rely on prior knowledge of the structure in the sample.

2. Methods

A. Cycle-Consistent Generative Adversarial Networks

G

CycleGAN [29] is based on the GAN architecture, and it is a special conditional generative adversarial network (cGAN) [30] for image-to-image “translation”—mapping from one type of image to another [31 - 33]. CycleGAN can learn image translation without paired examples. It trains two generative models cyclewise between input and output images by using adversarial losses [28], which means that CycleGAN has two generators and two discriminators. In addition to adversarial losses, CycleGAN uses cycle consistency loss [34, 35] to preserve the original image after a cycle of translation and reverse translation. In this formulation, matching pairs of images are no longer needed for training. This makes data preparation much simpler and opens the technique to a larger family of applications. The default generator architecture of CycleGAN is ResNet [36], and the default discriminator architecture is a Patch-GAN [33] classifier.

X

70 \times 70

B. Loss Function

X

[x \to g enerator A (x) \to 1 (when A (x) \approx y)]

C. Training

This paper generates the 1d_SIM images (super-resolution in one direction) and 9_SIM images (super-resolution in three directions) as datasets. The images of 1d_SIM contained high-frequency information in only one direction. CycleGANs are used to learn missing items of high-frequency information from a large dataset. Using a trained model, the missing values in the 1d_SIM image are filled, and a super-resolution 3_SIM image is reconstructed.

To train the neural network [Fig. 1(a)], we need two datasets for training (train A and train B). We used the images of the 1d_SIM dataset as train A and those of the 9_SIM dataset as train B. Train A and train B were input to the network as training datasets. Images of 1d_SIM in train A were transformed into those of 9_SIM by generator 9_SIM, and those of the 9_SIM image dataset generated by generator 9_SIM were transmitted to a generator 1d_SIM and converted back into images of 1d_SIM (cyclic 1d_SIM) [Fig. 1(b)]. The input images of the 9_SIM dataset were subjected to the same process, converted into images of the 1d_SIM dataset by generator 1d_SIM, and then converted into those of the 9_SIM dataset (cyclic 9_SIM) by generator 9_SIM.

Figure 1.Schematics of the deep neural network trained for SIM imaging. (a) The inputs are 1d_SIM and 9_SIM images generated by nine lower-resolution raw images (using the SIM algorithm) as two training datasets with different training labels. The deep neural network features two generators and two discriminators. These generators and discriminators are trained by optimizing various parameters to minimize the adversarial loss between the network’s input and output as well as cycle consistency loss between the network’s input image and the corresponding cyclic image. The cyclic 9_SIM in the schematics is the final image (3_SIM) desired. (b) Detailed schematics of half of the CycleGAN training phase (generator 1d_SIM and discriminator 9_SIM). The generator consists of three parts: an encoder (which uses convolution layers to extract features from the input image), a converter (which uses residual blocks to combine different similar features of the image), and a decoder (which uses the deconvolution layer to restore the low-level features from the feature vector), realizing the functions of encoding, transformation, and decoding. The discriminator uses a 1D convolution layer to determine whether these features belong to that particular category. The other half of the CycleGAN training phase (generator 9_SIM and discriminator 1d_SIM) is the same as this.

Download full size

View all figures

L_{gan}

3. RESULTS

We validated the proposed method on both simulated and experimental data. To enable quantitative comparison, the 1d_SIM and 9_SIM images were generated from the same raw datasets. The 1d_SIM images were reconstructed from three of the nine raw SI frames, and the 9_SIM images were reconstructed from all nine raw SI frames. To verify the effectiveness of the neural network on images with different features, the authors prepared three datasets for training containing points, lines, and curves.

512 \times 512

512 \times 512

$Experimental comparison of imaging modes with a database of point images. For all methods, nine raw SI images were used as the basis for processing. (a) The WF image was generated by summing all raw SI images. (b) 1d_SIM images were generated by three raw SI images in the x direction. (c) The 3_SIM images formed the output of the CNN training. (d) 9_SIM image reconstructed from nine SI raw images as the ground truth. The enlarged area shows neighboring beads in the dashed box. In both the 3_SIM and the 9_SIM images, the beads are distinguishable and yield a resolution beyond the diffraction limit, which 1d_SIM images cannot realize. The resolution of the point is shown in (e).$

Figure 2.Experimental comparison of imaging modes with a database of point images. For all methods, nine raw SI images were used as the basis for processing. (a) The WF image was generated by summing all raw SI images. (b) 1d_SIM images were generated by three raw SI images in the $x$ direction. (c) The 3_SIM images formed the output of the CNN training. (d) 9_SIM image reconstructed from nine SI raw images as the ground truth. The enlarged area shows neighboring beads in the dashed box. In both the 3_SIM and the 9_SIM images, the beads are distinguishable and yield a resolution beyond the diffraction limit, which 1d_SIM images cannot realize. The resolution of the point is shown in (e).

Download full size

View all figures

To further quantify this improvement in resolution achieved by the CNN, complex graphics were used to test the proposed method. Figure 3 shows the training results of the proposed method on the dataset of lines. Each image contained 50 straight lines with different slopes. Using the pretrained deep neural network and inputting the 1d_SIM images [Fig. 3(b)], images with enhanced resolutions were generated as shown in Fig. 3(c) . A number of features were clearly resolved in the network output, providing very good agreement with the ground truth (9_SIM) images shown in Fig. 3(d) . In Fig. 3(e), we can see that in the lines image, the neural network can still achieve the same resolution as SIM.

Figure 3.Using deep learning to transform images in the dataset of lines from 1d_SIM to 9_SIM. (a) WF line image. (b) 1d_SIM line image used as network input. (c) 3_SIM line image used as network output. (d) 9_SIM line image used as contrast. (e) The achieved resolution of different approaches of line images.

Download full size

View all figures

Figure 4 shows the training results of the proposed method on the dataset of randomly generated curves. After the training phase, the neural network blindly took an input image [1d_SIM, Fig. 4(b)] and output a super-resolved 3_SIM image [Fig. 4(c)] that matched the 9_SIM image [Fig. 4(d)] of the same sample. The resolution of the image was significantly improved (as shown in the dotted box).

Figure 4.Deep learning-enabled transformation of images of curves from 1d_SIM to 9_SIM. (a) WF curve image. (b) 1d_SIM image of curves used as input to the neural network. (c) 3_SIM image that was the network output, compared to the (d) 9_SIM image.

Download full size

View all figures

The proposed method was also tested on a homemade setup of total internal reflection structured illumination microscopy (TIRF-SIM) shown in Fig. 5 . Large datasets are typically used to train a deep neural network, but obtaining massive amounts of experimental images is challenging. A large dataset was obtained here by cutting the experimental images.

Figure 5.Experimental setup for the TIRF-SIM. A laser beam with a wavelength of 532 nm was employed as the light source. After expansion, the light was illuminated into digital micromirror device (DMD) and generated structured illumination. A polarizer and a half-wave plate were used to rotate the polarization orientation; a spatial mask is used to filter the excess frequency components. The generated structured illumination is tightly focused by a high-numerical-aperture (NA) oil-immersion objective lens (Olympus, $NA = 1.4$ , $100 \times$ ) from the bottom side onto the sample. The sample was fixed at a scanning stage and was prepared with the following procedures. A droplet of dilute nanoparticles (100 nm, attached with R6G molecules) suspension was subsequently dropped onto the prepared cover slip and evaporated naturally. After rinsing with water and air drying, the sample was ready for use.

Download full size

View all figures

NA = 1.4

2048 \times 2048

\sim 270 nm

Figure 6.Comparison of the experiment results of deep learning [(c) 3_SIM]) with (a) WF, (b) 1_direction SIM, and (d) 9_SIM. Wide-field images were generated by summing all raw images, 1d_SIM images were reconstructed using three SI raw images in one direction ( $x$ ), and the 9_SIM images were reconstructed from all nine SI raw images and used as ground truth compared with the 3_SIM images. The 1d_SIM image was used as input to the network to generate the 3_SIM images. The dotted frame in the figures shows an enlarged view of two areas (A and B), where the intensity distribution of the white dotted line is shown in the line chart on the right. In (a), two closely spaced nanobeads that could not be resolved by TIRF microscopy, and the 1d_SIM image super-resolved in one direction in (b). The trained neural network took the 1d_SIM image as input and resolved the beads, agreeing well with the SIM images.

Download full size

View all figures

For a quantitative assessment of the quality of the images output by the network, the corresponding root mean square error (RMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM index) [39], and mean structural similarity index (MSSIM) [40] were computed as shown in Table 1 . The SSIM and MSSIM correlated well with judgments based on the human visual perception. These indices were used to evaluate the differences between the images output by the network and the 9_SIM images. For any kind of images, the difference between the network’s output and the 9_SIM image was minor. This shows that the proposed method is effective at SIM imaging. The number of images needed to achieve the same resolution as traditional SIM imaging was reduced.

Table 1. Performance Metrics of the Proposed Method on the Testing Data

View table

View all Tables

Table 1. Performance Metrics of the Proposed Method on the Testing Data

Method	RMSE	PSNR	SSIM	MSSIM
Point (simulated)	7.4610	30.7772 dB	0.9796	0.9387
Line (simulated)	5.3098	30.6402 dB	0.9772	0.9291
Curve (simulated)	7.7903	28.4899 dB	0.9660	0.8989
Nanoparticles (real)	6.9316	28.7126 dB	0.9297	0.8347

Deep learning can also be used to transform images from wide field (WF) to SIM [41], but the proposed method has shortcomings. In WF-to-SIM transformation (WF2SIM), the high-frequency information is completely recovered through the guess of the neural network. But in 1d_SIM-to-SIM transformation (1d_SIM2SIM), some high-frequency information already exists in the image, and the neural network does not completely recover the high-frequency information by guessing. The WF2SIM method was compared with the 1d_SIM2SIM method, and we proved the superiority of the latter.

\sim 100 nm

Figure 7.Fourier analysis of the reconstructed images. (a) Comparison of the frequency spectrum of images with different numbers of Gaussian points. The frequency spectrum of the Gaussian points is highly symmetrical. (b) The different colors indicate different types of frequency-related information. The yellow area represents the frequency-related information of the original image, and the green area represents information restored by the network. The grid in (b) represents the relationship between the available frequency-related information and the frequency-related information recovered by the network. (c) The Fourier transform of the reconstructions in Fig. 2 was used to obtain the spectra. To illustrate the Fourier coverage of each model, three circles are marked in each image, where the green–yellow circle corresponds to support for the WF image, the blue circle corresponds to that for the 1d_SIM image, and the yellow circle represents support for the 3_SIM and 9_SIM images.

Download full size

View all figures

2 K

In the process of WF2SIM, the network was used to recover high-frequency information in the image [Fig. 7(b)]. Although the output image was highly consistent with the target image, it lacks a certain degree of credibility in theory because the high-frequency information was completely estimated by the neural network.

x

Different models were trained using different numbers of data items for the WF2SIM and 1d_SIM2SIM training datasets, and they were used to reconstruct the WF and 1d_SIM images, respectively. Figures 8(e) - 8(h) show the reconstructed WF image where some details were not restored; but in the reconstructed 1d_SIM images, the detail was correctly reconstructed [Figs. 8(b) and 8(c)]. In Figs. 8(i) and 8(j), transformations of the loss functions of the generator and cyclic consistency by the neural network are shown. The curve of the loss function of 1d_SIM converged more easily, whereas that of the WF struggled to converge, indicating the uncertainty in the recovery process of the WF image. Hence, 1d_SIM can better recover image detail and train the network model more efficiently, even though it requires two more images.

Figure 8.Comparing WF to 9_SIM with 1d_SIM to 9_SIM. (a) The 9_SIM image reconstructed from nine SI raw images. (b)–(d) Network output, 200, 500, and 900 image pairs (1d_SIM and 9_SIM) were used to train the network models, respectively. (e)–(h) Network output, using 100, 200, 500, and 900 image pairs (WF and 9_SIM) as datasets to train the network models. Each network underwent 10,000 iterations. Some details were not correctly restored in the WF-to-9_SIM training model. The arrows in (a)–(h) point to a missing detail.

Download full size

View all figures

4. CONCLUSION

Since the introduction of structured illumination microscopy, numerous algorithms have been developed to reconstruct super-resolved images from SI images. Considerable effort has been invested to reduce the number of raw SI frames, but images generated by such treatment are poor and require parameter tuning.

This study proposed a fast, precise, and parameter-free method for super-resolution imaging using SI frames. Unpaired simulated SIM images were used for unsupervised training by the CycleGAN network. The results of experiments showed that the CycleGAN used in this work performed well to help generate a reconstructed SIM image from three raw SIM frames (3_SIM). The quality of the generated image was very similar to the original nine-frame SIM image (9_SIM). The image reconstructed using 1d_SIM images through CNNs yielded images of better quality than that reconstructed from the WF image. During network training, 1d_SIM to 9_SIM also delivered better performance. In addition, recent studies [42] have shown that the frames in SIM can also be reduced by using U-net, and achieve super-resolution imaging with reduced photobleaching. However, in this method, U-net training requires a large amount of computing resources, so the training efficiency is far less than that of the CycleGAN used in this paper.

The central idea of the proposed technique is based on the observation that the SI image datasets contained a large amount of structural information. By the principle of ergodicity, statistical information learned from such large datasets ensembles in a 1d_SIM image is sufficient to predict 9_SIM images with high fidelity.

All images were blindly generated here by the deep network: that is, the input images were not previously seen by the network. Thus, the network can recover images by learning missing high-frequency information from large datasets, instead of merely replicating the images.

As a purely computational technique, the proposed method does not require any changes in current systems of microscopy and requires only standard 1d_SIM and 9_SIM images for training. Although different types of images need to be trained separately, the neural network used in our method enables us to complete the training efficiently. Once the model has been trained, it can be applied to new 1d_SIM images to rapidly generate a 9_SIM image. This approach can also be extended to nonlinear SIM to reduce the number of frames needed to render it suitable for bio-imaging.

Acknowledgment

Acknowledgment. L. Du acknowledges the support given by the Guangdong Special Support Program.

Category: Imaging Systems, Microscopy, and Displays

Received: May. 11, 2020

Accepted: Jun. 15, 2020

Published Online: Jul. 23, 2020

The Author Email: Chonglei Zhang (clzhang@szu.edu.cn), Luping Du (lpdu@szu.edu.cn), Xiaocong Yuan (xcyuan@szu.edu.cn)

DOI:10.1364/PRJ.396122