The integration of deep learning into computational imaging has driven substantial advancements in coherent diffraction imaging (CDI). While physics-driven neural networks have emerged as a promising approach through their unsupervised learning paradigm, their practical implementation faces critical challenges: measurement uncertainties in physical parameters (e.g., the propagation distance and the size of sample area) severely degrade reconstruction quality. To overcome this limitation, we propose a deep-learning-enabled spatial sample interval optimization framework that synergizes physical models with neural network adaptability. Our method embeds spatial sample intervals as trainable parameters within a PhysenNet architecture coupled with Fresnel diffraction physics, enabling simultaneous image reconstruction and system parameter calibration. Experimental validation demonstrates robust performance with structural similarity (SSIM) values consistently maintained at 0.6 across diffraction distances spanning of 10–200 mm, using a 1024 × 1024 region of interest (ROI) from a 1624 × 1440 CCD (pixel size: 4.5 μm) under 632.8 nm illumination. This framework has excellent fault tolerance, that is, it can still maintain high-quality image restoration even when the propagation distance measurement error is large. Compared to conventional iterative reconstruction algorithms, this approach can transform fixed parameters into learnable parameters, making almost all image restoration experiments easier to implement, enhancing system robustness against experimental uncertainties. This work establishes, to our knowledge, a new paradigm for adaptive diffraction imaging systems capable of operating in complex real scenarios.
【AIGC One Sentence Reading】:We propose a deep-learning framework for CDI that optimizes spatial sample intervals, enhancing robustness against parameter errors and improving image reconstruction quality.
【AIGC Short Abstract】:This work introduces a deep-learning framework that optimizes spatial sample intervals for coherent diffraction imaging, integrating physical models with neural network adaptability. It transforms fixed parameters into learnable ones, enhancing robustness against measurement uncertainties and maintaining high-quality image restoration, even with large errors in propagation distance.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
The synergistic integration of deep learning and computational imaging has driven transformative breakthroughs across multiple domains, including computational ghost imaging[1], digital holography[2], scattering medium reconstruction[3], fluorescence lifetime imaging[4], Fourier ptychographic microscopy[5], low-photon detection systems[6], and coherent diffraction imaging (CDI)[7]. Generally, in the research of computational imaging combined with deep learning, the dataset-driven supervised learning method was adopted, which requires large amounts of training data, and the distribution of test samples needs to be highly consistent with the training dataset to obtain better reconstruction effects. Unlike dataset-driven methodologies that rely on input-label paired datasets as implicit neural network priors[8], physics-driven approaches employ physical models as explicit priors to guide both neural network inference and training processes. These unsupervised learning techniques operate using only measured samples as input data, manifesting through two principal paradigms: 1) untrained methods that iteratively optimize neural networks to infer phase/amplitude information from measured intensity images, and 2) trained methods that utilize physics-based forward models to generate synthetic datasets for network pretraining, enabling subsequent phase retrieval from unseen intensity measurements[9]. Next, a brief overview of the development of the physics-model-driven method is provided.
The foundational implementation of physics-driven frameworks emerged in Fourier ptychography through Boominathan et al.’s work[10], which demonstrated the generation of simulated datasets via forward models to evaluate and optimize generator networks. Subsequent developments have expanded this paradigm across computational imaging domains. Wang et al.[11] introduced PhysenNet, integrating Fresnel transfer function propagators with U-Net architectures[12] for iterative phase retrieval of pure phase objects from diffraction patterns with known defocus distances. This method enables phase recovery from a single diffraction image without requiring extensive training data. Building on this foundation, Zhang et al.[13] developed BlindNet to alleviate the requirement for precise diffraction distance measurements. Yang et al.[14,15] extended the framework to complex amplitude objects, addressing reconstruction artifacts and noise through physics-informed loss functions that mitigate the ill-posed nature of inverse problems. Further innovations include Bai et al.’s[16] dual wavelength extension for noise and twin image suppression and Galande et al.’s[17] integration of explicit denoisers to counter overfitting in single measurement optimization. Li et al.[18] enhanced reconstruction quality by incorporating two-dimensional Haar wavelet transforms into autoencoder architectures, which introduced sparsity constraints during encoding/decoding processes while reducing reconstruction distortion. Zhang et al.[19] addressed limitations in conventional Fourier ptychographic microscopy (FPM)[20] through physics-embedded convolutional networks that compensated for optical aberrations without requiring extensive training data. Chen et al.[21] achieved single-shot hologram reconstruction using GAN-based frameworks[22] that synergized physical models with unsupervised learning.
However, the physics-driven forward models mentioned above face critical limitations in their dependence on precise experimental parameters including propagation distance (), size of the sample area (), and spatial sample interval () as measurement uncertainties often degrade reconstruction quality or even cause complete failure.
Sign up for Chinese Optics Letters TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
To overcome this problem, a deep-learning-enabled spatial sample interval optimization framework based on PhysenNet that synergizes physical models with neural network adaptability is proposed in this paper. The proposed methodology is systematically organized as follows: Sec. 2 establishes the theoretical foundation of our approach, comprising three critical components: 1) the architecture of the U-Net framework, 2) the operational principles of the PhysenNet model, and 3) the implementation specifics of the spatial sampling interval () optimization algorithm. Building upon this theoretical framework, Sec. 3 delineates the experimental protocol, including computational platform configurations, data acquisition procedures, and the empirical determination of optimal values across varying diffraction distances (). This section further provides quantitative validation of the optimized through rigorous benchmarking. The experimental outcomes conclusively demonstrate the algorithm’s capability to achieve both optimal sampling intervals and high-fidelity image restoration. Finally, Sec. 4 synthesizes the key contributions of this study, emphasizing the method’s demonstrated effectiveness, computational robustness, and potential for generalization in complex imaging scenarios.
2. Theoretical Framework
The PhysenNet framework enables label-free network training through phase retrieval from a single diffraction pattern of the object. In this model, a monochromatic plane wave with wavelength illuminates the object . After propagating through a diffraction distance , the resultant diffraction pattern can be expressed via the Fresnel diffraction integral: where represents the Fresnel propagator, with and denoting the spatial frequencies along the - and -axes, respectively. The operators and correspond to the discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT), respectively. Notably, the part does not affect the transverse spatial structure of the observation surface[23]. In order to improve the calculation efficiency, it can be ignored in the program. The intensity distribution of the diffraction pattern captured by the CCD is given by
The basic workflow is illustrated in Fig. 1. The input diffraction intensity was processed through U-Net to generate an estimated intensity distribution . In conventional neural networks, the ground truth image must be known to compute the error for updating the network’s weights and biases. PhysenNet computes the diffraction pattern through the Fresnel propagator , followed by calculating the loss function between the measured diffraction pattern and the simulated pattern . The weights and biases are then optimized via gradient descent.
Through iterative optimization, the simulated diffraction pattern converges toward the measured pattern , while the estimated intensity simultaneously converges to a physically feasible solution, thereby successfully recovering the original image. It is evident that the iterative process in PhysenNet relies on the Fresnel propagator , which requires precise knowledge of physical parameters such as the diffraction distance and spatial frequency . However, in practical experiments, measurement uncertainties in inevitably arise. Furthermore, as increases, a dynamic mismatch emerges between and the spatially expanding diffraction pattern. These combined factors critically degrade the image recovery quality.
This study proposed incorporating the spatial sample interval as an optimizable parameter in PhysenNet, effectively addressing the issues of measurement inaccuracies in the diffraction distance and the mismatch between . The corresponding objective function can be expressed as where denotes the corresponding neural network employed U-Net as the network architecture (Fig. 1). U-Net architecture employed in this study features a symmetric encoder-decoder structure, where the encoder hierarchically extracts multi-scale features through four stages of kernels (channel progression: 1→32→64→128→256) coupled with max-pooling operations, progressively reducing spatial resolution from to , with each stage comprising dual convolutional layers, batch normalization, and LeakyReLU activation; the decoder restores spatial resolution via transposed convolutions (channel regression: 512→256→128→64→32), iteratively fused with encoder features through skip connections at corresponding resolution levels, culminating in dual 32-channel convolutions followed by a convolution for final image reconstruction, ensuring detail recovery and cross-scale contextual integration throughout the network. is defined through the physical model of diffraction described by Eqs. (1)–(3).
The detailed workflow is illustrated in Fig. 2. In the optimization algorithm [Fig. 2(a)], we first parameterize as a learnable neural network variable and initialize it (to ensure that its physical meaning conforms to the hardware characteristics, the maximum value should be the CCD pixel size[19]). Therefore, in this paper, the learning range of is restricted to 0.003 and 0.0045 mm. To achieve this constraint, is mapped from the unconstrained scalar to the [0, 1] interval through the sigmoid function, and then obtained through the linear transformation formula · sigmoid () so that negative or excessive values can be avoided. The step size of the update can be adjusted according to the learning rate of the neural network.
will uniformly adjust the parameters in the physical model , and PhysenNet will generate the correct simulated diffraction image through . Then the loss between the simulated diffraction pattern and the experimental diffraction pattern is calculated. After backpropagation, gradient descent, updating the weights and biases of the neural network, and adjusting , a new round of iterations was started. Through the iterative optimization of the neural network, the initial value will gradually converge to the optimal direction, and finally, the optimal reconstructed image and the optimal value will be obtained at the same time after the training. In order to verify whether the network has learned the best , we designed the fixed optimization algorithm in Fig. 2(b), let the neural network stop learning , and set as a manually adjusted parameter. Except that the optimal value will not be output in the end, other steps are the same as the optimization algorithm.
In this work, the mean squared error (MSE) loss was employed to enforce global structural alignment with the target, while total variation (TV) regularization loss refines local high-frequency features by suppressing noise artifacts and preserving edge sharpness.
3. Experiments and Image Reconstruction
3.1. Experimental setup and procedures
PhysenNet was implemented using PyTorch 2.2 with CUDA 12.1 acceleration in a Python 3.12 environment. The Adam optimizer was utilized to jointly optimize the network weights , biases , and spatial sampling interval , with distinct learning rates of 0.001 for the U-Net parameters and 0.0095 for . Training converged within 10000 epochs, requiring approximately 11 min on a workstation equipped with: Intel® Core™ i9-14900KF CPU (3.2 GHz), 128 GB DDR5 RAM (4000 MHz), NVIDIA GeForce RTX 4090 GPU (24 GB VRAM).
Figure 3 illustrates the experimental setup. As depicted, a 632.8 nm He-Ne laser beam was collimated through an aperture and illuminated a USAF1951 resolution test target. The diffracted intensity pattern was subsequently captured by a CCD camera with an effective pixel array of and a pixel pitch of , where a region of interest (ROI) of pixel was selected to analyze the central diffraction features.
Firstly, experiments were conducted using traditional Gerchberg-Saxton (GS) and hybrid input-output (HIO) algorithms[24,25] to recover the image from the experimentally captured Fresnel zone diffraction pattern [Fig. 4(b)]. The restored results are shown in Figs. 4(c) and 4(d). In the Fraunhofer model, the phase factor can be ignored and the propagation process was simplified to a Fourier transform, while in the Fresnel model, the cannot be ignored. However, there is a measurement error between the measured diffraction distance and the measured value of the spatial sample interval , and these two values are directly related to , which leads to the GS and HIO algorithms not correctly restoring the image [shown in Figs. 4(c) () and 4(d) ()]. To accurately restore images, we must devise a new and more effective image restoration method.
Figure 4.(a) Ground truth. (b) Diffraction pattern (z = 40 mm, pixel_size = 0.0045 mm). (c) Restored image by GS algorithm (iteration = 10000). (d) Restored image by HIO algorithm (iteration = 10000).
Neural networks have strong expressive power, which can model complex nonlinear propagation relationships, alleviate the mathematical ill-posedness of traditional phase recovery, and improve image restoration quality by integrating physical prior features. Among these, PhysenNet, which employs an untrained iterative method, has shown exceptional performance in recovering Fresnel diffraction patterns. This study will investigate the spatial sample interval () optimization framework based on PhysenNet.
Since remains a fixed value that may not be accurate in GS and HIO algorithms, preventing image restoration, we integrated it to a learnable parameter in PhysenNet shown in the workflow of Fig. 2(a). During iterative network training, the initialized value gradually converges toward the optimal direction (resulting in Fig. 5). Ultimately, after neural network training, both the optimally reconstructed image and the best value are obtained. To validate whether the network has learned the optimal , we designed the algorithm in Fig. 2(b), where was excluded from network learning. By fixing different values near the network-optimized , we performed image reconstruction (yielding Fig. 6) and evaluated the recovery quality of diffraction patterns under varying settings.
Figure 5.(a1)–(a10) Diffraction patterns corresponding to different diffraction distances z. (b1)–(b10) Recovered images corresponding to different diffraction distances z with trained dx using algorithm A.
Figures 5(a1)–5(a10) are the original experimental image, which is the diffraction pattern obtained at different diffraction distances . Figures 5(b1)–5(b10) are the image restored by neural network optimization of using algorithm A [shown in Fig. 2(a)]. The results of image reconstruction within a distance range of 10 to 200 mm, as shown in Fig. 5, indicate that the image reconstruction network algorithm can reconstruct images and is superior to traditional GS and HIO algorithms (as shown in Fig. 4 with a fixed value of and ). This work uses the structural similarity index (SSIM) to evaluate the quality of reconstructed images. SSIM is a widely used image similarity assessment metric that can capture the structural information of images and is more consistent with human visual perception. In the experiment with a diffraction distance of 10–200 mm, the SSIM values obtained by this algorithm were between 0.5 and 0.65. This SSIM range indicates that the reconstructed image has a high similarity with the original image, demonstrating that the algorithm can achieve good reconstruction quality.
From the reconstruction results in Fig. 5 using algorithm A, it can be seen that when utilizing physical modeling methods for image restoration, precise matching through the adjustment of the spatial sample interval was crucial to achieve optimal restoration results. By treating the as a trainable variable, U-Net automatically determines the optimal value to accommodate varying diffraction distances . This approach integrates image restoration with the simultaneous search for values, allowing the network to self-adapt to shifts in experimental conditions and minimize over-reliance on measured values.
To confirm that the trained neural network has indeed learned the optimal value, we stop letting the neural network learn and restore the image using algorithm B near the , which has been learned by the neural network, with a stride of . The reconstruction results are shown in Figs. 6 and 7; Fig. 6 shows the specific restored images corresponding to different at diffraction distances of , 70, 130, 170 mm, and Fig. 7 shows their specific SSIM values from Fig. 6 (that is, Fig. 6 only shows a partially restored image of Fig. 7). It can be found that when restoring with smaller or larger (compared to the already optimized ), the restoration quality of the image decreases, which indicates that the neural network has indeed learned the optimal value.
Figure 7.Specific SSIM values for different z with different fixed dx. (a) z = 10–50 mm; (b) z = 60–100 mm; (c) z = 110–150 mm; (d) z = 160–200 mm.
When employing physics-model-based methods for image recovery, optimal reconstruction necessitates precise spatial sample interval () matching. By parameterizing as a learnable variable within the U-Net, the network autonomously optimizes across varying diffraction distances. For instance, at , the learned achieves , demonstrating adaptive parameter tuning. In contrast, using the ( equals the pixel size of the CCD with angular spectrum diffraction theory) as a fixed parameter in the physical model results in failed reconstructions (Fig. 6). This joint optimization of and image recovery enables the network to dynamically adapt to experimental variations, mitigating over-reliance on manual measurements.
Figure 8 further quantifies the optimal relationship, revealing a positive correlation between and propagation distance, where the Pearson correlation coefficient is with . This phenomenon arises from the spatial frequency modulation in Fresnel diffraction. As increases, the quadratic phase term ] imposes stronger high-frequency suppression, effectively narrowing the spatial frequency bandwidth. Increasing under larger equivalently performs frequency domain downsampling, ensuring alignment between the sampling interval and the diffraction-limited bandwidth. This adaptive strategy prevents high-frequency-noise amplification while preserving reconstruction quality across varying experimental configurations.
Figure 8.Specific SSIM values for different z corresponding to different best dx values.
The experiments in Figs. 9 and 10 are designed to quantitatively evaluate the robustness of the proposed method to the diffraction distance measurement error. Figure 9 shows the results of image restoration using different sizes of deviation distance when the measured diffraction distance , and Fig. 10(a) shows the SSIM value of the restored image (near ) and the value learned by the network adaptively. The experimental results show that when the measurement error , the SSIM value of the restored image was still stable above 0.6, and the network compensates for the frequency domain distortion caused by the range deviation by dynamically adjusting the value of . When , the original image cannot be restored. The above results show that the effectiveness of this method at and the failure at essentially reflect the interaction boundary between the network parameter adaptation mechanism and the physical model constraint: when the measurement error was within the critical threshold, the network can adjust the in Eq. (2) to restore images by adjusting the frequency domain compensation realized by . Below, we will provide a detailed mathematical derivation and theoretical analysis of the experimental results within the linear range of Fig. 10(a).
Figure 9.Restored images with different Δz when the diffraction distance (a) zmeasured = 50 mm and (b) zmeasured = 100 mm.
Figure 10.(a) When zmeasured = 50 mm, the trained dx (dxadjust) and SSIM values obtained using different zinput and algorithm A corresponding to the restored images in Fig. 9(a). (b) When zmeasured = 100 mm, the trained dx (dxadjust) and SSIM values obtained using different zinput and algorithm A corresponding to the restored images in Fig. 9(b). (c) Tolerance of the distance uncertainty. Error bars denote the maximum tolerance of the initial value of zinput with respect to zmeasured that PhysenNet is allowed.
In the DFT, the spatial frequency resolution is determined by the spatial sampling interval and the number of sampling points :
It can be seen that the spatial frequencies and are inversely proportional to . To generate an that iterates in the correct direction, the phase term of must remain unchanged; therefore, when (input value of the network) is inconsistent with , adjust to make the phase term meet: where and are the adjusted spatial frequencies corresponding to . and are the spatial frequencies corresponding to optimal . With from Eq. (5), we can get
Substitute and into the phase equivalence condition and eliminate the common factor to obtain
Equation (9) theoretically elucidates the relationship between and , aligning with the red curve with values ranging from 40 to 60 mm, which can restore the image, as shown in Figs. 10(a) and 10(b).
As revealed in Fig. 10(a), when the measurement error was within , the newly learned by the neural network closely follows the relationship defined in Eq. (9) with the , which demonstrates the strong robustness of the proposed method against measurement deviations. When ( values ranging from 40 to 60 mm), the nonlinear distortion caused by error was beyond the adjustable range of network parameters. The large error value in the optimization process makes the loss value between the simulated diffraction pattern generated by and the real diffraction pattern unable to be effectively reduced, making the gradient unable to update in the right direction, making the optimization stagnate and leading to recovery failure. Figure 10(b) is another powerful illustration of the above conclusion.
Figure 10(c) shows the maximum allowable measurement error (error bar interval) corresponding to different measured diffraction distances , further verifying the above conclusion: when the input distance deviation was within the critical range of the error bar identification (such as when or values ranging from 130 to 170 mm), the network can realize effective compensation by adjusting (). Once exceeds this range (such as when ), even if the required by the theory was still within the range of the preset parameters [0.003 mm, 0.005 mm], the neural network will not be able to recover the image due to the large deviation value. It was necessary to further adjust the preset parameters, increase the training time, or adjust the learning rate of to obtain a better recovery effect.
The proposed method, which adjusts the spatial sample interval to address measurement inaccuracies in physical parameters, demonstrates significant tolerance to distance uncertainties. By performing image recovery with multiple initial values deviating from the true diffraction distance, the results (Fig. 10) illustrate the maximum allowable distance error between the measured and actual diffraction distances. For example, when the diffraction distance was 50 mm, the maximum allowable distance error was . At 200 mm, the maximum allowable distance error was . Even within these error ranges, the SSIM of reconstructed images remains stable between 0.5 and 0.7. This confirms the method’s ability to maintain high reconstruction quality, highlighting its robustness against experimental parameter uncertainties. Consequently, the method effectively mitigates the impact of distance measurement errors in practical image recovery.
4. Conclusion
This work proposes a deep learning method with dynamic spatial sample interval optimization to address reconstruction accuracy degradation caused by physical parameter measurement errors in coherent diffraction imaging. By integrating the sample interval as a learnable parameter into the PhysenNet framework and coupling it with the Fresnel propagation model, adaptive compensation for the propagation distance and mismatch errors was achieved. Experimental validation demonstrates stable reconstruction quality (SSIM 0.5–0.7) across varying diffraction distances (10–200 mm) under the following conditions: CCD effective , , and cropped input . The method tolerates maximum distance measurement errors ranging from to , depending on propagation distance.
Further analysis reveals that optimal values require progressive adjustment with increasing to accommodate wavefield divergence characteristics, confirming the necessity of dynamic parameter optimization. Compared to conventional methods relying on precise parameter calibration, this strategy significantly enhances experimental fault tolerance and reduces reliance on high-precision instrumentation.
It is worth noting that when exceeds 150 mm, due to the limited sampling width of CCD and the capture of diffraction images, the high-frequency components of the original image will be lost, ultimately resulting in a decrease in the edge sharpness, detail resolution, and SSIM value of the restored image. Missing high-frequency components restrict the network’s ability to recover fine details, even with physics-guided parameter optimization. Nevertheless, the proposed dynamic adjustment method maximizes utilization of frequency components within the CCD’s effective bandwidth, effectively mitigating the ill-posed problem of diffraction image recovery. Future work will focus on developing multiparameter joint optimization frameworks and extending applications to biological microscopy and industrial nondestructive testing.
[22] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza et al. Generative adversarial networks(2014).
[23] D. G. Voelz. Computational Fourier Optics: A MATLAB Tutorial(2011).
[24] R. W. Gerchberg, W. O. Saxton. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik, 35, 237(1972).