1State Key Laboratory of Mesoscopic Physics and Frontiers Science Center for Nano-optoelectronics, School of Physics, Peking University, Beijing 100871, China
2National Biomedical Imaging Center, Peking University, Beijing 100871, China
3Institute of Modern Optics, Nankai University, Tianjin Key Laboratory of Micro-Scale Optical Information Science and Technology, Tianjin 300350, China
4Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan 030006, China
5Peking University Yangtze Delta Institute of Optoelectronics, Nantong 226010, China
Multi-angle illumination is a widely adopted strategy in various super-resolution imaging systems, where improving computational efficiency and signal-to-noise ratio (SNR) remains a critical challenge. In this study, we propose the integration of the iterative kernel correction (IKC) algorithm with a multi-angle (MA) illumination scheme to enhance imaging reconstruction efficiency and SNR. The proposed IKC-MA scheme demonstrates the capability to significantly reduce image acquisition time while achieving high-quality reconstruction within 1 s, without relying on extensive experimental datasets. This ensures broad applicability across diverse imaging scenarios. Experimental results indicate substantial improvements in imaging speed and quality compared to conventional methods, with the IKC-MA model achieving a remarkable reduction in data acquisition time. This approach offers a faster and more generalizable solution for super-resolution microscopic imaging, paving the way for advancements in real-time imaging applications.
【AIGC One Sentence Reading】:IKC-MA scheme enhances imaging reconstruction and SNR, reducing acquisition time for high-quality super-resolution imaging.
【AIGC Short Abstract】:This study introduces an iterative kernel correction algorithm integrated with multi-angle illumination for super-resolution imaging. The approach enhances imaging reconstruction efficiency and SNR, reducing acquisition time while maintaining high quality. Experimental results show significant improvements in speed and quality, offering a faster, more generalizable solution for microscopic imaging and real-time applications.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
1. INTRODUCTION
Microscopy plays an important role in the field of biomedicine. With the development of microscopy, the resolution of microscopes has been continuously improved. However, since the inherent trade-off between numerical aperture (NA) and field-of-view (FOV) in objective lens design persists as a fundamental physical constraint, expanding the space–bandwidth product (SBP) remains a fundamental challenge in contemporary microscopy research. According to the Rayleigh criterion, optimal spatial resolution fundamentally requires maximizing NA or minimizing illumination wavelength . However, visible spectrum illumination operates within defined spectral limits, precluding indefinite wavelength reduction. The increased aperture of the objective lens often comes with larger aberrations and higher cost. Moreover, the materials available for objective lenses with high NA are limited. Achieving resolution enhancement in microscopy systems without compromising other critical performance parameters remains a central focus in microscopy research.
Multi-angle (MA) illumination has been widely applied in various super-resolution microscopy techniques [1–5]. Combined with computational optics, such as synthetic aperture, MA illumination can extend the NA and achieve super-resolution imaging [6,7]. This approach demonstrates particular efficacy in enhancing resolution under the NA-FOV trade-off inherent to optical systems. Typical examples are lens-less imaging [1,8,9], synthetic aperture imaging [10], and Fourier ptychographic microscopy (FPM) [11,12]. Unlike traditional wide-field microscopy, which uses a single illumination angle, MA illumination imaging collects more spatial frequency to enable super-resolution imaging [11] or 3D reconstruction [13]. However, MA illumination imaging often requires pre-designed illumination angle combinations [14,15] and relies on complex algorithms [11] to reconstruct the sample’s complex amplitude or 3D refractive index distribution. These factors increase acquisition and reconstruction time, limiting its application in real-time imaging. Over the years, several extensions and improvements have been achieved in the acquisition time [16–18] and reconstruction time [19–23]. However, most improvements only address one issue at a time, while the challenge of simultaneously reducing both acquisition and reconstruction time remains.
In recent years, deep learning has emerged as a powerful tool in image reconstruction. Scholars have applied deep neural networks to improve the speed of reconstruction in FPM [24–27]. However, the existing deep networks often require a substantial amount of experimental data and the reconstruction results of traditional recovery algorithms as the training set, resulting in poor generalization ability. Iterative kernel correction (IKC) is a single-image super-resolution (SISR) recovery algorithm that utilizes a residual neural network to iteratively correct the blur kernel, achieving more precise image restoration, particularly in complex or unknown blurring scenarios [28]. It effectively reduces the algorithm’s dependence on a large-scale experimental training set.
Sign up for Photonics Research TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
Leveraging the physical principle of MA illumination imaging, the paper builds a deep learning network on IKC, named IKC-MA, which can greatly improve the generalization ability of the network. IKC-MA includes three sub-networks for MA illumination: the prediction (P) network, the reconstruction (R) network, and the correction (C) network. In addition, the network uses Zernike aberration polynomials instead of the traditional principal component analysis (PCA) method, which can effectively improve the final image reconstruction quality. Finally, we find that IKC-MA also has comparable reconstruction quality for a small number of images. Simulation and experimental results verify the effectiveness of the proposed model. Compared with the traditional MA illumination microscope technology, the image acquisition time of this method can be shortened by half, and the image reconstruction can be completed within 1 s. In addition, different from the deep-learning-based recovery network, our network is built based on the physical model of MA illumination, which does not depend on a large-scale experimental training set, and greatly improves its generalization ability.
2. METHODS
IKC-MA is a super-resolution algorithm utilizing a deep learning framework to correct the optical system’s pupil function and reconstructs a super-resolution (SR) image from MA low-resolution (LR) input. In the IKC-MA, the MA illumination predictor network (P) estimates the feature vector from LR images. The MA reconstruction network (R) takes the feature vector and MA illuminated LR images as input, and outputs super-resolution (SR) images through a physics-constrained frequency domain. Subsequently, the MA corrector network (C) iteratively optimizes the feature vector based on residuals from the SR images of the R network. This iterative process leads to progressively improved pupil function estimates and, consequently, higher-quality super-resolution images.
A. Forward Degradation Model
In the IKC-MA network, the degradation process of a high-resolution (HR) image can be represented by Eq. (1): where is the LR image under the -th illumination angle with a total of different illumination angles. is the high-resolution image. PSF is the point spread function (PSF) of the system. represents the illumination at the -th illumination angle. is the convolution operation. is the down-sampling operation with a scale factor and is the isotropic Gaussian noise.
Assuming the PSF size is , the PSF space is an dimensional linear space. Consequently, the pupil function space also is dimensions. To improve computational efficiency, in the IKC-MA network, the pupil is projected onto a reduced -dimensional linear subspace using a dimensionality reduction matrix , constructed from Zernike polynomials. It is represented by a feature vector ; denotes the Zernike polynomials coefficients [29,30]. Unlike common approaches in deep learning that rely on PCA [31,32], we adopt Zernike polynomials to compress the pupil from frequency-domain space to a lower-dimensional coefficient space, better aligning with the physical characteristics of optical systems.
B. IKC-MA Network
The IKC network is a single-image super-resolution (SISR) technique [28] based on an unknown PSF, which enhances LR images into SR images. Building on it, the IKC-MA network integrates the physical framework of MA illumination and the real imaging process with aberrations. The IKC-MA network simultaneously receives hundreds of LR images at different illumination angles and through three component networks—P, R, and C—generates SR intensity and phase images.
The IKC-MA network is shown in Fig. 1. The role of the P model is to initialize the feature vector , the R model performs the reconstruction, and C model optimizes the feature vector . A brief overview of these three models is provided below, with detailed information available in Appendix A.
The P network initializes the feature vector . It takes raw LR images as input (where ) and outputs the first estimate of the feature vector . The core design involves processing all LR images with the same convolutional kernels to extract global degradation features, which are independent of illumination angles. The strategy allows the network to disregard angle-specific details, instead focusing on the cross-angle commonality of the system.
The R network, based on the SFTMD network [28], reconstructs the SR images from the LR images. This model takes as input the feature vector and raw LR images. The output consists of SR intensity and phase images. The reconstruction process involves three main steps: stitching the feature vector with the raw LR images, applying the spatial feature transform (SFT) layer [33,34], and performing sub-pixel convolution.
The stitching of the feature vector with the LR images is built upon the SRMD [34] network. First, the input feature vector is expanded into a 3D feature matrix , where all elements in the -th layer of are identical to the -th element of . The expanded 3D feature matrix is then fused with the features of the LR images, serving as input for the SFT layer. The core of the R-model lies in the use of spatial feature transformation (SFT) layers. The SFT layer calculates transformation parameters based on the dimensionality-stretched feature matrix (implicit pupil) and intermediate feature images from the network, thereby leveraging the degradation features to reconstruct the super-resolution images process. These parameters, (scaling factors) and (shifting factors), perform spatial transformations on the stitched input during the reconstruction process. Finally, the R model outputs SR amplitude and phase images through a sub-pixel convolution module.
The C network focuses on iteratively optimizing the feature vector through residuals. Its input consists of the SR intensity and phase images and the feature vector from the previous iteration . The output is the update for the feature vector . To reduce computational cost, the network indirectly compares the error between the ground truth feature vector and the estimated feature vector, instead of directly comparing the error between the SR ground truth image and the SR reconstruction image.
C. IKC-MA Workflow
Initially, the P network gives the estimate ; then the first SR image recovery result is by the R network.
In the -th iteration, the specific steps are as follows:
After iterations, is the final output of IKC.
Many networks use the PCA to reduce the dimension of the initial image [35,36]. However, PCA lacks physical principle alignment. In the IKC-MA network, the PCA is replaced by Zernike polynomials, which better reflect real imaging. This substitution enhances the network’s alignment with real-world imaging, reducing the feature vector dimension to 28. In order to clearly prove that the Zernike polynomials reconstruction effect is better than that of the PCA, we conducted pre-experiments, building an MA illumination network using PCA with 105 coefficients for dimensionality reduction, named the PCA network, and compared the PCA network and the IKC-MA network.
In summary, the IKC-MA network takes Eq. (1) to generate N LR images for constructing the training set at scale. Subsequently, it takes Zernike polynomials to construct the dimensionality reduction matrix to obtain the feature vector. With the N LR images as input, SR images are generated by iteratively optimizing through three sub-networks: P, R, and C. The deep learning networks naturally have the advantage of having a very short reconstruction time when the training is complete.
D. Experimental Setup
The training set consists of images from the Flickr2K [37] and DIV2K [38] datasets, totaling 12,800 HR images, with 6400 randomly selected as intensity images and the remainder as corresponding phase images. Each image is divided into four patches of size .
We employ PCA and Zernike polynomial decomposition, simulating pupil functions for super-resolution (SR) image generation via MATLAB software. System parameters include a 0.13 NA objective lens and an illumination wavelength of 470 nm. The LED interval is 4 mm, and the height difference between the LED array and the sample is 186 mm. The setup simulates the LR image generation process of an MA illumination system. The deep learning framework is implemented using PyTorch on a server with 64 GB of RAM. The training dataset comprises 80,000 images, with a learning rate set at . The batch size for each training session is 16. The computer hardware includes an Intel i9-7920X CPU, and an NVIDIA TITAN RTX GPU with CUDA 11.1 and PyTorch 1.8.1.
3. RESULTS AND DISCUSSION
A. Simulation Results
Figure 2 shows the simulation results of the IKC-MA system for MA illumination imaging. Here, we list the recovery results and comparison of two systems we built for MA illumination imaging. The difference between them is the difference dimension reduction methods in pupil functions. The first system, the PCA network, takes 105-order PCA for feature extraction. Considering that the Zernike circular polynomial is better to approximate the real imaging, we use 28 Zernike circular polynomials to extract the pupil function of the SR image. Figure 2(a1) is the central LR image with the central illumination source at LED (0,0). Figure 2(a2) is a schematic diagram of different illumination angles corresponding to a series of LR images. The central red dot indicates the illumination angle corresponding to Fig. 2(a1). Figure 2(b1) is the high-resolution (HR) intensity image, which is taken as the ground truth (GT) to evaluate the recovery effect. Figure 2(c1) is the SR intensity image generated by the 105-order PCA. Figure 2(d1) is the SR intensity image generated by the 28 Zernike polynomials method; Figs. 2(b2)–2(d2) are the zoomed-in views of the boxed regions in Figs. 2(b1)–2(d1).
Figure 2.The simulation recovery results of IKC-MA system for MA illumination imaging. (a1) Central low-resolution image under the central angle LED (0, 0) illumination; (a2) schematic diagram of different illumination angles corresponding to a series of LR images; (a2) the central red dot indicates the illumination angle corresponding to (a1). (b1) High-resolution (HR) intensity image, as the ground truth (GT) to evaluate the recovery effect; (c1) SR intensity image generated by 105-order PCA method, and (d1) SR intensity image generated by 28 Zernike circular polynomials method; (b2)–(d2) zoomed-in views of the boxed regions in (b1)–(d1).
Comparing Figs. 2(c1) and 2(d1) with Fig. 2(a1), it can be seen that two networks can have the ability to reconstruct from a series of original LR images. Further comparing Fig. 2(d2) with Fig. 2(c2), it can be seen that the IKC-MA system using Zernike polynomials is better than the method using PCA. The PCA network in Fig. 2(c2) exhibits artifacts and noise, which degrade the quality of the restored image. The IKC-MA network simulates the imaging process of the physical pupil function, so the effect in Fig. 2(d2) is better.
Many MA illumination imaging recovery algorithms require high spectral overlap as a necessary condition for algorithm convergence, and it is experimentally verified that the IKC-MA does not require high spectral overlap, as shown in Fig. 3. We employ annular illumination element distributions with progressively increasing densities (1, 4, 6, 8, 12, 16, 20, 24, 30 elements per annulus). Considering the spectrum similarity of adjacent illumination units in the outer ring of the LED array, the LR images are collected at intervals, and finally 121 LR images are collected.
Figure 3.The reconstruction results of different numbers of LR images by IKC-MA system. (a) Central LR image under the central angle LED (0, 0), (b) high-resolution intensity image, which is the ground truth to evaluate the recovery effect, and (c) SR image generated from 241 LR images. (d) SR images generated from 121 LR images. (e) Schematic diagram of different illumination angles corresponding to the original LR image number in (c), while (f) corresponds to (d).
Figure 3 is the demonstration of halving the acquisition time of the IKC-MA system. Figure 3(a) is the central LR image under the central angle illumination, Fig. 3(b) is the high-resolution intensity image, which is the ground truth to evaluate the recovery effect, and Fig. 3(c) is the SR image generated from 241 LR images. In order to halve the number of images, 121 LR images are taken to generate SR images. Figure 3(d) is the SR image generated from 121 LR images. Visually, both Figs. 3(c) and 3(d) demonstrate nearly identical reconstruction quality, but the acquisition time is only half of the original. Figures 3(e) and 3(f) are schematic diagrams of different illumination angles corresponding to the original LR images number in Figs. 3(c) and 3(d), respectively.
To evaluate the proposed method further quantitatively, we take the structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) as indicators to evaluate image quality. Table 1 gives the quantitative evaluation, in which the larger SSIM and PSNR values correspond to better imaging results. Based on the values in Table 1, the Zernike method is superior to the PCA method in both SSIM and PSNR. The comparison between the IKC-MA system with 241 LR images and IKC-MA system with 121 LR images shows that IKC-MA can reduce the data acquisition time by half while maintaining the image quality of the reconstruction image.
Quantitative Evaluation of Simulation Results
Method
SSIM
PSNR
Recover Phase
Traditional IKC [28]
0.8201
28.750 dB
No
PCA-Net
0.9292
34.692 dB
Yes
IKC-MA+241LR
0.9499
36.968 dB
Yes
IKC-MA+121LR
0.9403
35.985 dB
Yes
B. Experimental Results
The illumination source is a circular LED array for the optical system. The array consists of nine concentric rings with the following radial spacing distribution: the distance from the center to the first ring is 12 mm, while all subsequent adjacent rings are spaced 10 mm apart. From the center outward, each ring contains 1, 8, 12, 16, 24, 32, 40, 48, and 60 LED emitters, totaling 241 illumination-emitting units, all arranged with equal angular intervals. The experiment utilizes monochromatic illumination mode, with all LEDs operating in the blue spectral channel (central wavelength 470 nm). The distance between the LED array plane and the sample plane is set to 186 mm. The optical imaging system is equipped with an Olympus infinity-corrected microscope objective lens (OPLNFL4X, 0.13 NA). Image acquisition was conducted with a FLIR FL3-U3-13Y3M monochromatic imaging sensor featuring a 4.8 μm pixel pitch.
Figure 4 shows the recovery results of different systems for the target (USAF1951, Edmund, 64-863). Figure 4(a1) displays the central LR intensity image acquired through a , 0.13 NA (Olympus) objective lens. Figure 4(a2) is the SR intensity image acquired through a objective lens (Olympus) as the ground truth for evaluating the restoration effect. Figures 4(b1) and 4(b2) are the SR intensity image and phase image recovered by the traditional FPM algorithm. Figures 4(c1) and 4(c2) are SR intensity images and phase images recovered by the IKC-MA system from 241 LR images. Figures 4(d1) and 4(d2) are the SR intensity images and phase images recovered by the IKC-MA system from 121 LR images. Figure 4(e) is the device diagram of the system to collect the experimental data of LR images. From bottom to top, the imaging system consists of the LED array, the sample, the objective lens, and the camera; Fig. 4(f) is the intensity distribution curve at the position indicated by the yellow arrow in Figs. 4(b1), 4(c1), and 4(d1). The red, yellow, and blue curves correspond to the red, yellow, and blue lines in Figs. 4(b1), 4(c1), and 4(d1), respectively.
Figure 4.The USAF1951 SR results by the IKC-MA system and the traditional FPM system. (a1) Central LR intensity image with 4× objective lens. (a2) HR intensity image with 20× objective lens, which is the ground truth to evaluate the recovery effect. (b1), (b2) SR intensity image and phase image recovered by the traditional FPM algorithm. (c1), (c2) SR intensity images and phase images recovered by IKC-MA system from 241 LR images. (d1), (d2) SR intensity images and phase images recovered by IKC-MA system from 121 LR images. (e) The diagram of the experimental setup. From bottom to top, the imaging system consists of the LED array, the sample, the objective lens, and the camera; (f) intensity distribution curve of the position indicated by the yellow arrow in (b1), (c1), and (d1). The red, yellow, and blue curves correspond to the red, yellow, and blue lines in (b1), (c1), and (d1), respectively.
While the Fig. 4(d1) metric derived from 121 LR images exhibits marginal degradation, visualization of USAF1951 Group 9 Element 3 targets is maintained. It can be seen from the phase image that 9-3 vertical fringes cannot be distinguished in Fig. 4(b2), but can be clearly distinguished in Figs. 4(c2) and 4(d2). In general, the reconstruction effect of the IKC-MA system is better than that of traditional FPM reconstruction. Compared with the FPM system, the data acquisition time and recovery time of the IKC-MA system are greatly reduced.
Figure 5 shows the recovery results comparison of brain cell sections by different systems. Figure 5(a1) is the central LR intensity image acquired through a , 0.13 NA objective lens. Figure 5(a2) is the SR intensity image acquired through the objective lens, which is the ground truth for evaluating the restoration effect. Figures 5(b1) and 5(b2) are the SR intensity image and phase image recovered by the traditional FPM algorithm. Figures 5(c1) and 5(c2) are SR intensity images and phase images recovered by the IKC-MA system from 241 LR images. Figures 5(d1) and 5(d2) are the SR intensity images and phase images recovered by the IKC-MA system from 121 LR images.
Figure 5.The SR results of different systems on brain cell samples. (a1) Central LR image with objective lens, and (a2) SR image with objective lens (ground truth). (b1), (b2) SR intensity images and phase images of brain cell recovery by traditional FPM algorithm. (c1), (c2) SR intensity images and phase images recovered by IKC-MA system from 241 LR images. (d1), (d2) SR intensity images and phase images recovered by IKC-MA system from 121 LR images.
From the perspective of intensity images, FPM, IKC-MA with 241 LR images, and IKC-MA with 121 LR images have achieved high-resolution reconstruction images, but the SR images restored by FPM and IKC-MA methods are different in image quality. The traditional FPM reconstruction introduces noise into the recovered images, so the Fig. 5(b1) image has obvious white speckle noise, and the FPM reconstruction result is too sharp. For example, the nucleus at the center of Fig. 5(b1) was incorrectly restored. On the contrary, IKC-MA recovers a slightly smooth result without amplifying the noise information error. For phase images, the recovery results of the FPM algorithm are worse than results of IKC-MA with 241 LR images and IKC-MA with 121 LR images. Compared with the FPM system, the IKC-MA system demonstrates significant improvements in data acquisition speed and reconstruction speed, as quantitatively compared in Table 2.
Quantitative Evaluation of Experimental Results by Different Systems
Method
Recovery Time (s)
Acquisition Time (s)
FPM
30
270
PCA-Net
0.3
270
IKC-MA+241LR
0.3
270
IKC-MA+121LR
0.3
136
C. Discussion
To mitigate sample dependency in deep learning frameworks, we formulate a physics-driven forward model that synthesizes training-compliant datasets through rigorous emulation of MA illumination imaging principles. The IKC-MA network uses the simulated images generated by the forward model instead of the actual sample data to construct the training set. The results show that the IKC-MA network has strong generalization ability. The IKC-MA network utilizes Zernike polynomials for pupil function initialization, replacing conventional PCA methods. This substitution leverages Zernike polynomials’ compatibility with optical aberration in microscopy systems, thereby enhancing reconstruction quality.
Compared to traditional MA illumination recovery algorithms, the IKC-MA network can converge with fewer spectral overlaps, thus effectively reducing the number of LR images and shortening the data acquisition time. Extensive studies have demonstrated that when conditions such as the restricted isometry property (RIP) in compressive sensing and sparsity assumptions are satisfied, the number of LR images can be reduced while still enabling high-resolution (HR) image reconstruction [39–41]. It indicates significant redundancy in data acquisition for traditional algorithms. However, many reconstruction algorithms, such as the Gerchberg-Saxton (GS) algorithm in Fourier ptychographic microscopy (FPM), rely on gradient-based iterative optimization. To ensure convergence, these algorithms necessitate sufficient phase information cross-validation. In other words, high-frequency spectral overlap is not a prerequisite for acquiring sufficient spectral information of the object but rather a requirement for algorithmic convergence. It provides a foundation for the IKC-MA to reduce the number of LR images while maintaining reconstruction ability.
Here are three key factors in the IKC-MA network. First, for low-overlap LR images, the IKC-MA network takes an end-to-end collaborative optimization architecture: the P-model learns to map the information from LR images into the frequency domain, while extracting cross-angle shared features; the R-model generates HR images using the predicted information by the P-model; and the C-model performs iterative correction. Through this adaptive learning in the frequency domain, it dynamically compensates for the spectral information loss. Second, while traditional methods often assume the pupil function is ideal, the IKC-MA network models practical factors such as amplitude attenuation (e.g., uneven illumination sources), phase distortion (e.g., system aberrations), and sensor noise. In traditional recovery algorithms, these factors exacerbate the model’s tendency to fall into local optima, necessitating higher spectral overlap to ensure convergence. Finally, compared to traditional algorithms, which are mostly based on linear relationships, the IKC-MA network implicitly applies pupil functions as physical constraints. By the powerful nonlinear mapping capabilities of deep learning, it effectively learns the priors between the spatial and frequency information from large-scale datasets, thereby enabling the reconstruction of high-frequency details from low-overlap LR inputs.
It should be noted that the study adopts “multi-angle illumination imaging” rather than being confined to specific techniques, which stems from the theoretical universality of the proposed method. The IKC-MA network only requires MA LR images as input; it shares the same input form with techniques such as FPM and IDT [13]. The network achieves reconstruction by mapping information from LR images into the frequency space, and then adaptive learning in this domain. The IKC-MA algorithm is independent of high-frequency spectral overlap or Kramers-Kronig (KK) relation constraints. The characteristic enables the IKC-MA to be suitable for MA illumination imaging.
There is still much room for improvement in the IKC-MA method, and further research can be developed in these places in the future. First, combined with the simulation results and experimental results, it can be seen that the imaging effect on real images is not as good as that on natural images, indicating that the noise and error in real data cause certain interference. In the future, the correction of system error can be considered in the IKC-MA. Second, in order to train adequately, the amount of training data is relatively large. With 500,000 trainings, the training time is more than 160 h. In the future, we can explore model simplification strategies. Finally, the current network is mainly for single-channel grayscale images, but in the MA imaging technology, some scholars have reconstructed the three-channel color information of the sample. In the future, we can consider improving the existing system to the three-channel color recovery system.
4. CONCLUSIONS
The paper combines the physical model of MA illumination with the deep learning algorithm of blind super-resolution imaging [28], and proposes an IKC-MA network suitable for MA illumination. The principle of IKC-MA, the network architecture, and its working principle are introduced, and the model in this paper is validated through simulation and experiment data.
Finally, the simulation and experimental results show that the network in this paper has multiple advantages: because Zernike polynomials match the optical aberrations in microscopy systems, the model in the paper achieves enhanced imaging quality and superior noise reduction capabilities compared to conventional methods. Since this network is training under the unknown pupil, compared with other methods [25] combining deep learning and MA illumination, the method in this paper has higher generalization. Compared with the typical MA illumination technology [11], the network in this paper can reduce the data acquisition time and recovery time while obtaining SR images and phase images.
APPENDIX A: IKC-MA MODEL CONSTRUCTION
The IKC-MA model decomposes the optimization objective into two sequential steps.
First, with the feature vector fixed, the super-resolution (SR) reconstruction is optimized. The step corresponds to the R (multi-angle illumination reconstruction) network, whose optimization function is formulated as
Second, with the SR images fixed, the residual of feature vector is optimized. The step corresponds to the C (multi-angle illumination correction) network, whose optimization function is defined as
Multi-angle Illumination Predictor Network (P)
The P network initializes the feature vector by taking raw low-resolution (LR) images as input and generating the initial estimate . For feature vector initialization, the prediction network iteratively applies convolutional layers with 64 kernels and leaky ReLU activation over six alternating stages. The output of the final activation undergoes average pooling to produce the initial feature vector . All LR images are processed through identical convolutional kernels, followed by a global average pooling layer in the fully connected network, yielding a 28-dimensional Zernike coefficient vector .
The R model is a reconstruction network designed to reconstruct high-resolution (HR) images from a sequence of N low-resolution (LR) images. The core of the model lies in the use of the spatial feature transform (SFT) layer from existing literature, which eliminates the influence of the point spread function (PSF) on the reconstructed image quality, as described. At the beginning of the R model, N LR images are first processed through three convolutional layers, each containing 64 two-dimensional convolution kernels. The output of these convolutional layers is then activated using leaky ReLU activation.
The structure of the SFT layer is illustrated in the green-colored portion of Fig. 1. The module has two inputs: the processed feature vector and the feature images containing LR information. First, the feature vector is expanded into a three-dimensional feature matrix of size by converting each element in into a 2D matrix. The -th layer of this three-dimensional matrix is equal to the -th element of the feature vector . On the other hand, the N LR images are convolved through three layers, each containing 64 convolution kernels, and then activated with leaky ReLU. This step serves to extract features from the LR images.
The SFT layer concatenates the expanded three-dimensional feature matrix and the convolved LR image matrix along the channel dimension. The concatenated matrix is then passed through two convolutional layers, each containing two 32 convolution kernels, followed by leaky ReLU activation. The two activation results are sequentially multiplied and added with the input of the SFT layer to produce the output of the SFT layer. The SFT layer takes convolutions on the concatenated features to generate modulation parameters (scaling factors) and (shifting factors), dynamically controlling the feature response along the reconstruction path: where is the Hadamard product, and and are the scaling factors and shift factors.
The reconstruction network further incorporates residual blocks (RBs) to encapsulate two SFT layers and two convolutional layers, each containing 64 convolution kernels. The input and output of each RB are linked to prevent gradient vanishing.
At the end of the reconstruction network, there is a sub-pixel convolution module, which contains a convolution layer and a pixel shuffling layer. The convolution layer transforms the input into , where is the super-resolution factor, and the pixel shuffling layer updates the feature images . They are the SR amplitude and phase images.
The final forward pass of the reconstruction network is as follows. First, the input data is convolved through three layers of 64 convolution kernels, followed by leaky ReLU activation. The activated result is processed sequentially through 32 RBs to eliminate the influence of the PSF and extract spatial features. The reconstruction network further adds the input of the first RB to the output of the last RB to avoid gradient vanishing. To extract high-resolution spatial features, the sum is imported into a separate SFT layer. After convolutions with organized layers, the result is up-sampled to obtain the final reconstruction, i.e., the HR amplitude and phase. The main architecture of the R model is inspired by the SFTMD model of the IKC network [28] and the SRRrdNET [42].
Multi-angle Illumination Corrector Network (C)
The core of the C model lies in iteratively optimizing the feature vector through residual corrections. For each iteration, the correction network takes the predicted SR image from the -th iteration and the feature vector from the ()-th iteration as inputs. Specifically, undergoes seven alternating convolutions with 64 convolutional kernels and leaky ReLU activation layers, while is processed through two fully connected layers, each with 64 nodes, and then expanded into a three-dimensional matrix. This matrix is then combined with the activated result of .
The combined result is then passed through two convolutional layers: one with 128 convolution kernels and the other with 64 convolution kernels. Finally, global pooling is applied to obtain the average value, which updates the feature vector, denoted as .
APPENDIX B: SETUP OF THE DIMENSIONALITY REDUCTION MATRIX
Setup of the PCA Dimensionality Reduction Matrix
The principal component analysis (PCA) [31] achieves dimensionality reduction by projecting data onto the directions of maximum variance through eigendecomposition of the covariance matrix. It preserves the principal information of the data.
According to Eq. (1) in the main manuscript, let the pre-dimensionality reduction pupil be , which is flattened into a column vector . For images, we construct the data matrix .
First, prior to analyzing variable relationships, we remove global intensity effects by centering the data. The mean vector is (). Subsequently, we center the data matrix by subtracting the mean vector: where is the mean-removed data matrix, is the original data matrix, and is the mean vector.
Next, the covariance matrix is subsequently computed to quantify pairwise interactions between variables, mathematically expressed as
The covariance matrix undergoes eigendecomposition to obtain eigenvalues (sorted in descending order) and corresponding eigenvectors , formally expressed as
Finally, we select the top eigenvalues and their corresponding eigenvectors to construct the dimensionality reduction matrix:
The reduced-dimensional data matrix is obtained by projecting the centered data onto the principal subspace . Then, the low-dimensional approximation model is where is the reduced-dimensional data matrix. is the PCA projection matrix, and is the projection of the data matrix onto the direction , which corresponds to the projection coefficients in the low-dimensional space.
However, approximating the original -dimensional data with -dimensional data introduces residuals. The residual matrix can be expressed as , where contains unexplained information. The effectiveness of dimensionality reduction is determined by variance retention. Typically, we select the top components such that the cumulative contribution rate reaches a predefined threshold (e.g., 95%), indicating successful dimensionality reduction:
For the pupil function of the system, its dimensionality reduced representation is expressed in the expanded form where indicates the process of restoring the vector obtained by unfolding the matrix column-wise back to the original matrix; the expanded representation is expressed as where is the mean matrix, is the -th eigen-matrix (principal component), is the projection coefficient of the -th principal component, is the residual error, and is the selected order of PCA. In the comparative PCA method described in the main manuscript, the PCA order is set to .
In summary, the PCA method achieves dimensionality reduction by projecting data onto the top principal components through eigendecomposition of the covariance matrix. This process relies solely on the statistical properties of the data and does not account for its physical connotation.
Setup of the Zernike Dimensionality Reduction Matrix
Zernike polynomials are employed to address optical aberrations with rotational symmetry, particularly well-suited for wavefront representation in optical systems. Defined as a set of orthogonal basis functions over the unit circle, they enable compression of the optical system’s point spread function (PSF) into a lower-dimensional coefficient space.
Zernike polynomials are defined on the unit circle through a combination of radial polynomials and angular functions , with the general form [30,43] where is the radial coordinate (), is the angular coordinate, is the normalized radial polynomial, and is the polar coordinates on the unit disk.
To ensure the orthogonality of Zernike polynomials, we perform orthogonalization such that polynomials of different orders remain mutually orthogonal:
The wavefront function characterizing the wavefront in the optical system can be expanded by orthogonally complete Zernike polynomials. The Zernike coefficients before the polynomials can be obtained by convolving the wavefront function with the Zernike polynomials. The procedure involves
Dimensionality reduction is achieved by retaining low-order Zernike polynomial coefficients. Typically, low-order modes suffice to characterize the majority of optical aberrations in practical systems, thereby allowing higher-order modes to be neglected while maintaining reconstruction fidelity.
For the pupil function of the system, its dimensionality-reduced representation is expressed in the expanded form where represents the error term. is the Zernike polynomial coefficients. The coefficient matrix formed by 28 is the eigenvector .
The selection of 28 Zernike coefficients (corresponding to sixth-order polynomials) balances practical aberration characterization in optical systems with computational efficiency. In real-world microscopy imaging, higher-order aberrations typically contribute less [44]. Sixth-order Zernike polynomials can effectively capture dominant aberration types such as defocus, astigmatism, and coma, while higher-order terms primarily correspond to faint high-frequency distortions or noise with diminished physical interpretability [30,45]. Moreover, incorporating higher-order terms significantly increases model complexity and iterative optimization time. This trade-off prioritizes practical applicability in computational imaging scenarios where real-time performance is critical. Therefore, in our IKC-MA network model, we retain the first 28 Zernike polynomial terms.
Method Comparison
In computational optical imaging systems, the Zernike polynomial method is superior to PCA. First, each Zernike mode corresponds to classical optical aberrations (spherical aberration, coma, astigmatism, etc.) with clear physical interpretations. Second, Zernike polynomials are defined on a circular aperture, inherently orthogonal and geometrically consistent with optical apertures. The fixed, data-independent basis functions of Zernike polynomials demonstrate superior noise resistance compared to PCA modes, which depend on data variations. Additionally, mathematically, Zernike expansions can be rapidly computed via closed-form formulas, and high-order extensions are straightforward, whereas PCA requires iterative eigendecomposition.
[25] A. Kappeler, S. Ghosh, J. Holloway. Ptychnet: CNN based Fourier ptychography. International Conference on Image Processing (ICIP), 1712-1716(2017).
[27] F. Shamshad, F. Abbas, A. Ahmed. Deep Ptych: subsampled Fourier ptychography using generative priors. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7720-7724(2019).
[28] J. J. Gu, H. N. Lu, W. M. Zuo. Blind super-resolution with iterative kernel correction. Conference on Computer Vision and Pattern Recognition (CVPR), 1604-1613(2019).
[33] X. T. Wang, K. Yu, C. Dong. Recovering realistic texture in image super-resolution by deep spatial feature transform. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 606-615(2018).
[34] K. Zhang, W. Zuo, L. Zhang. Learning a single convolutional super-resolution network for multiple degradations. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3262-3271(2018).
[37] R. Timofte, E. Agustsson, L. Van Gool. NTIRE 2017 challenge on single image super-resolution: methods and results. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1110-1121(2017).
[38] E. Agustsson, R. Timofte. NTIRE 2017 challenge on single image super-resolution: dataset and study. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1122-1131(2017).
[42] C. Ledig, L. Theis, F. Huszár. Photo-realistic single image super-resolution using a generative adversarial network. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 105-114(2017).
[44] P. Pankajakshan, B. Zhang, L. Blanc-Feraud. Parametric blind deconvolution for confocal laser scanning microscopy. 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 6532-6535(2007).