Deep learning for computational imaging: from data-driven to physics-enhanced approaches

Fei Wang; Juergen W. Czarske; Guohai Situ

doi:10.1117/1.AP.7.5.054002

1 Introduction

As an advanced imaging methodology, computational imaging (CI) integrates computation with optical hardware for image formation. Such computation typically encompasses the processing of analog and digital signals, corresponding to the encoding and decoding process. Encoding is implemented by modulating the light field on the object surface, aperture plane, and image plane (or their conjugate planes), resulting in the techniques of coded illumination, coded apertures, and coded detection (see Fig. 1, upper part), respectively.

Figure 1.Schematic diagram of a computational imaging system. The object $f$ is modulated by properly designed encoding components, forming the measurement $g = H (f)$ . The object image $f^{*}$ can be reconstructed from $g$ provided prior information about the underlying imaging system and $f$ .

Download full size

View all figures

The introduction of encoding components breaks the “point-to-point” object–image relationship that conventional optical imaging systems are optimized to achieve. This means that the intensity patterns captured by the camera sensor are not necessarily the direct images of the objects in the geometric optics sense, but the encoded patterns from which suitable algorithms should be developed to computationally reconstruct the object images (i.e., decoding). Using this encoding–decoding pipeline, CI excels in various aspects. First, it captures not only the intensity but also the phase/propagation direction,1 phase space,2 polarization state,3 depth,4 spectra,5 time of flight,6 topological charges,7 and even the quantum state of light,8 although indirectly. Second, it can break the limits of the space-bandwidth product (SBP) of the optical system, thus improving the imaging performance in terms of resolution,9^–11 field of view,12^,13 and depth of field.14 Third, it can also simplify imaging systems by removing lenses toward unique advantages such as single-shot lensless 3D imaging4^,15 and deliver exceptional performance in environments that are relatively dark16^,17 and strongly inhomogeneous,18^–21 where traditional imaging techniques struggle.

As mentioned above, the imaging process in CI relies on well-designed computational algorithms to recover the object image $f$ from the raw detected signal $g$ , which is encoded through the CI system $H$ (see Fig. 1). This represents a typical inverse problem. Traditional inverse solvers in CI can generally be categorized into physics-based and model-based methods,22 both using $H$ and $g$ to reconstruct an estimate of $f$ , i.e., $f^{*}$ (see Fig. 1, lower part). Physics-based methods produce object images $f$ by explicitly inverting the image formation operator, e.g., $f^{*} = H^{- 1} (g)$ , offering advantages in computational speed and interpretability. However, these methods require full characterization of the inverse operator $H^{- 1}$ and multiple measurements by suitable means for high-quality reconstructions due to the loss of information.22 Model-based methods or iterative methods termed in some literature23 grounded in optimization theory aim to iteratively refine the reconstruction. Their objective is twofold: to adhere to the constraints imposed by the measurements and to align with handcrafted priors such as smoothness and sparsity. However, iterative optimization usually requires computation time ranging from several minutes to hours on a typical laptop, often demanding several gigabytes of memory to store the image formation operator and intermediate data.24^,25 In addition, the regularization term must be carefully designed based on appropriate assumptions.23

During the past decade, we have witnessed a paradigm shift in solving inverse problems in CI using deep learning (DL). DL is a class of algorithms that utilizes multilayer deep neural networks (DNNs) with a vast number of parameters, the values of which are iteratively adjusted to empirically “learn” the implicit relationships among a set of training data.26 This learning process is also called training. Once a DNN model has been trained, it can be used for the related tasks. Recently, several review articles have been published on the use of such data-driven DL for CI.24^,27^–35 One can refer to these articles for a more thorough overview. Here, we would like to emphasize three well-known issues. First, a network model trained by a set of data usually performs well only on test data that obey more or less the same statistical distribution as that training set.36 Second, the reasons behind the superior performance of DL remain challenging to interpret.37 Finally, the selection of neural network models often lacks a theoretical explanation.38

One way to address the aforementioned issues is to incorporate physical knowledge of the CI system into DNNs. Such a physics-enhanced DL approach leverages both implicit priors from training data and explicit physics priors from the imaging system for improved image reconstruction. Consequently, physics-enhanced DL methods offer significant advantages in terms of measurement requirements, versatility, and interpretability. Figure 2 qualitatively compares the performance of different CI reconstruction algorithms across these aspects (see Sec. 5.5 for quantitative comparisons). For measurement requirements, the physics-based methods obtain images through analytical calculations without considering any prior information about the object, thereby requiring many measurements to achieve high-quality results. By contrast, model-based reconstruction algorithms introduce handcrafted priors by adding regularization terms to the objective function during iterative optimization. Due to the inclusion of prior information, these algorithms can achieve the same or better image quality with fewer measurements compared with purely physics-based methods. Data-driven DL further reduces the number of required measurements by taking advantage of the rich implicit prior information contained in the training data. Physics-enhanced DL methods combine all available explicit and implicit prior information, reducing the necessary number of measurements as much as possible.

Figure 2.Differences in measurements and priors required by different reconstruction methods for the same imaging quality. Measurements: the amount of data recorded directly by the detector containing object information. Priors: explicitly defined image priors, image formation operators, and implicitly expressed priors based on a parameter model. Versatility: linked to universality. Interpretability: connected to XAI. Image quality: related to contrast, SSIM, and correlation coefficients. Each method introduces different types of prior information in unique ways, resulting in differences in versatility, interpretability, and the required measurements to achieve similar imaging quality. The circle diameter represents the versatility, whereas the gray level is related to interpretability. A quantitative evaluation can be found in Sec. 5.5.

Download full size

View all figures

Regarding the versatility, the suitability of the prior information introduced to the actual scenario determines the versatility of the reconstruction methods. Physics-based methods, which do not rely on image priors, exhibit the best versatility. Model-based methods also offer good versatility as handcrafted priors capture general image properties and can be controlled by adjusting regularization parameters. By contrast, the performance of data-driven DL is often tied to training data. It requires complex retraining to be applied to different types of test data, resulting in poor versatility. Physics-enhanced DL methods address the generalization issue of data-driven methods by incorporating physical models, providing excellent versatility. In terms of interpretability, physics-based, model-based, and physics-enhanced DL methods produce results that adhere to physical model constraints, thereby offering good interpretability. By contrast, data-driven DL methods are often perceived as black boxes with limited interpretability. Due to their advantages in reducing measurement requirements, enhancing versatility, and improving interpretability, physics-enhanced DL methods have garnered significant attention in recent years.

This review aims to provide a comprehensive examination of DL-based CI, with a particular focus on the latest advancements in physics-enhanced or physics-informed DL techniques. It is important to recognize that CI is a broad field that includes numerous specialized areas. However, this review will not explore these specific subfields in depth. For readers seeking more detailed insights into these subfields, we recommend consulting the comprehensive review articles and books available, such as the ones referenced in Refs. 23 and 39. In addition, this review will not extensively cover the various neural network architectures as that is not the primary focus of our discussion. Those interested in a more in-depth exploration of neural network structures are encouraged to refer to the following sources: Refs. 40 –42. It should be noted that integrating human knowledge with DL is a hot topic across multiple disciplines.43^–46 However, this paper focuses exclusively on its application in CI.

The structure is organized as follows. Section 2 provides a concise introduction to the concept of inverse problems in CI. Section 3 covers the basics of DL (readers with relevant prior knowledge may skip this part). Section 4 explains how data-driven DL solves inverse problems in CI and discusses its advantages and disadvantages. Section 5 summarizes how recent literature integrates physics prior with DL, followed by a comparison of different implementations. Finally, Sec. 6 outlines future research directions. To ensure clarity and facilitate reference, the abbreviations and symbols that appear throughout this paper are summarized in Table 1.

Table 1. Abbreviations and symbols.

View table

View all Tables

Table 1. Abbreviations and symbols.


Abbreviations	Symbols
AI4S	AI for science	$X$	Object space
AIGC	AI generative contents	$Y$	Measurement space
ART	Algebraic reconstruction technique	$g$	Element in $X$
AWGN	Additive white Gaussian noise	$f$	Element in $Y$
BP	Back propagation	$H$	Forward model
CGH	Computer-generated holography	$N$	Noise distribution
CI	Computational imaging	$P$	Probability density
CNN	Convolutional neural network	$α$	Regularization parameter
CS	Compressed sensing	$θ$	Parameters in NN
CT	Computed tomography	$W$	Weights
$D^{2} NN$	Diffractive deep neural network	$b$	Biases
DGI	Deep gradient descent	$σ (\cdot)$	Activation function
DIP	Deep image prior	$η$	Learning rate
DL	Deep learning	$R_{θ}$	NN mapping operator
DNN	Deep neural network	$x$	NN input
FFDNET	Fast and flexible denoising CNN	$y$	NN label
FPM	Fourier ptychographic microscopy	$x^{i}$	$i$ ’th data of $x$
FPP	Fringe projection profilometry	$y^{i}$	$i$ ’th data of $y$
GAN	Generative adversarial network	$D$	Training set
GD	Gradient descent	$L$	Loss function
GS	Gerchberg–Saxton	$\nabla_{θ} L$	Gradient
INR	Implicit neural representation	$N$	# training data
MAE	Mean absolute error	$B$	Batch size
MAP	Maximum a posteriori	$P$	# object-measurement pair
MLE	Maximum likelihood estimation	$Q$	# measurement pair
MRI	Magnetic resonance imaging
NN	Neural network
PCLF	Physics-consistency loss function
PnP	Plug-and-play
RNN	Recurrent neural network
SBP	Space-bandwidth product
SGD	Stochastic gradient descent
SLM	Spatial light modulator
SNR	Signal-to-noise ratio
SPI	Single pixel imaging

2 Inverse Problem in Optical Imaging

As schematically illustrated in Fig. 3, traditional imaging encounters substantial information loss. CI mitigates this problem by employing proper encoding techniques, which allows for the acquisition of more information. Typically, due to the incorporation of encoding, the imaging model of CI can be formulated in the following form: $g \sim N (H (f)),$ (1)where $f$ and $g$ are elements in $X$ (object space) and $Y$ (measurement space), respectively. The mapping function $H : X \to Y$ can be regarded as a forward operator of the imaging problem, which is determined by factors such as the way the system is illuminated, the way the object is encoded, and the way the image is captured. In many cases, $f$ and $g$ are represented as one-dimensional vectors. However, it is not necessary to be so for some other cases. $N (H (f))$ represents a noise distribution characterized by an expectation of $H (f)$ . This distribution is typically represented as an additive Gaussian noise process, a Poisson random process, or sometimes as a combination of both.47 In practical CI systems, noise can originate from various sources, including disturbances in the light source, environmental factors, photoelectric conversion, electrical amplifiers, quantization in analog-to-digital conversion, and uncertainty associated with the forward operator $H$ .

Figure 3.Representation illustrating the relationship between objects and images in traditional imaging and computational imaging. The decrease in area in measurement space $Y$ compared with object space $X$ signifies information loss during the imaging process. Traditional imaging loses a significant amount of information, whereas computational imaging, through the introduction of encoding, can record more information than traditional imaging.

Download full size

View all figures

For many CI tasks, such as tomography, ghost imaging, and coded aperture imaging, $g$ does not represent the direct image of $f$ . Instead, one must solve the inverse problem of Eq. (1) to computationally reconstruct an estimate, $f^{*}$ , of $f$ . If the forward model $H$ is known or can be calibrated, this can be formulated as a parameter estimation problem and solved using, for example, a maximum likelihood estimation (MLE) approach (taken over a single sample),22 $f^{*} = \underset{f}{\arg \max} P (g | f) = \underset{f}{\arg \min} - \log P (g | f),$ (2)where the likelihood function $P (g | f)$ has a closed form if we assume that $N (\cdot)$ obeys some typical statistical distributions. For example, $P (g | f) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(H (f) - g)^{2}}{2 σ^{2}})$ when $N (\cdot)$ is additive white Gaussian noise (AWGN) with a standard deviation $σ$ . In this case, Eq. (2) has the form of $f^{*} = \underset{f}{\arg \min} - \log P (g | f) = \underset{f}{\arg \min} ‖ H (f) - g ‖_{2}^{2} .$ (3)It is clear that now we have the objective function in the least square sense.

Unfortunately, the inverse problem such as Eq. (3) is typically ill-posed because of two main factors. First, although CI allows for the recording of more information about the object compared to traditional imaging techniques, the forward operator $H$ is often infeasible or prohibitively expensive to project the complete information of the object from the object space $X$ to the measurement space $Y$ (as schematically illustrated in Fig. 3). Second, the presence of noise introduces ambiguity.23 Examples of ill-posed inverse problems in CI include scenarios such as the linear inverse problem featuring a measurement matrix $H$ with a nonfull rank, the phase retrieval problem where only intensity diffraction patterns are measured, and the super-resolution problem where only low-frequency information is available. The ill-posedness of the inverse problem usually leads to multiple unfeasible, distinct solutions that still fit the measurements. Consequently, the MLE approach cannot ensure good results for ill-posed inverse problems.

One of the most straightforward strategies to solve the ill-posed inverse problem in CI is to obtain more high-quality measurements. Multiheight phase retrieval, ghost/single-pixel imaging with a high sampling ratio, synthetic aperture optical diffraction tomography, and ptychography using high overlap are some of the typical examples. This improves the measurement diversity and reduces the ill-posedness of the inverse problem, thereby providing a more stable and feasible solution. However, the demand for additional measurements increases the burden of data acquisition, leading to decreased efficiency.

Another way to solve the ill-posed inverse problem in CI involves incorporating prior information to compensate for lost information. This method simultaneously utilizes measurement signals and prior knowledge regarding the probable characteristics of $f$ (e.g., sparse and smooth), hence categorizing it as a form of maximum a posteriori (MAP) estimation, $f^{*} = \underset{f}{\arg \max} P (f | g) = \underset{f}{\arg \max} P (g | f) P (f) = \underset{f}{\arg \min} - \log P (g | f) - \log P (f) .$ (4)

For the special case of AWGN and $f \sim N (0, \frac{1}{α} I)$ , the MAP formulation leads to $f^{*} = \underset{f}{\arg \min} - \log P (g | f) - \log P (f) = \underset{f}{\arg \min} ‖ H (f) - g ‖_{2}^{2} + α ‖ f ‖_{2}^{2} .$ (5)

The optimization objective function comprises two primary components: the data discrepancy term and the regularization term, with a scale parameter denoted as $α$ . The former assesses the congruence between the estimated $f^{*}$ and the current measurements, whereas the latter evaluates the degree to which the estimated $f^{*}$ adheres to prior assumptions.

In contrast to the MLE approach, the essence of MAP lies in seeking a solution that not only fits the measurements but also complies with prior knowledge. The interplay between data discrepancy and prior assumptions facilitates the identification of the most probable solution among the potential candidates for $f$ that adhere to the image formation model, resulting in an effective strategy to solve ill-posed inverse problems in CI. Note that one can choose different forms of the prior distribution $P (f)$ , leading to different forms of regularization terms. Tikhonov regularization,48 sparsity regularization,49 and total variation regularization50 are some typical examples.

This approach significantly improves imaging system efficiency by striking a delicate balance between measurements and prior knowledge. It has been widely applied in various CI tasks, including compressive imaging,6^,51^,52 phase imaging,53^–55 lensless imaging,4^,56 and super-resolution imaging,57 among many others. However, three fundamental challenges remain. First, the iterative optimization process is often computationally intensive, limiting its applicability even when using powerful graphics processing units (GPUs) in real-time imaging applications. Second, designing a proper prior distribution for $f$ is crucial as it can bias the final results. Last, the requirement for knowledge of the forward operator $H$ can be challenging to meet, particularly in complex systems such as imaging through scattering media.

The emerging DL-based approach, which learns implicit priors from training data, offers a new paradigm to address issues present in traditional inverse solvers. In the following sections, we will dive deeper into DL-based CI.

3 Deep Learning Basics

Like traditional machine learning algorithms such as support vector machine (SVM) and random forest, DL is designed to learn from data and acquire empirical knowledge.26 The significant difference is that most DL technology leverages multilayer neural networks (NNs) for automatic feature representation learning, eliminating the need for feature engineering inherent in traditional machine learning algorithms.38 The function of NN can be easily established. For example, suppose that the input of an NN with $L$ layers is $x$ , the weights between layers $l - 1$ and $l$ are $W^{l - 1}$ , the biases are $b^{l - 1}$ , and the activation function is $σ (\cdot)$ . The output of the neural network can be calculated through forward propagation (see Fig. 4), $NN (x) = σ (W^{L - 1} \dots σ (W^{1} σ (W^{0} x + b^{0}) + b^{1}) \dots b^{L - 1}),$ (6)which means that the output of the neural network is determined by the values of a set of network parameters (weights and biases) $θ = {W^{1}, b^{1}, W^{2}, b^{2}, \dots, W^{L - 1}, b^{L - 1}}$ . Consequently, NN can be seen as a parametric mapping model $R_{θ}$ . If the parameter space of $θ$ is sufficiently large, $R_{θ}$ is theoretically capable of simulating the mapping process between any input and output using appropriately determined weights $θ$ .58 This provides a good parametric model to learn from large-scale data with complex structures.

Figure 4.Illustration of forward propagation and backpropagation in a multilayer neural network. The process of forward propagation involves sequentially processing the input data through each layer, yielding the prediction of the given input data at the output layer. Backpropagation, on the other hand, propagates the error between the prediction and the label from the output layer back to the input layer. The gradient used for optimizing network parameters at each layer can be computed based on information obtained during both forward and backward calculations. For the sake of conciseness, we have not taken the bias into account in this context as it can also be integrated as part of the weight parameter.

Download full size

View all figures

Suppose a training set $D = {(x^{i}, y^{i}); i = 1, 2, \dots, N}$ , where the input $x^{i}$ and the label $y^{i}$ are combined to obtain the $i$ ’th pair of training data. Data-driven DL techniques use an NN model $R_{θ}$ that is learned to fit the training data $D$ to establish a mapping relationship from $x$ to $y$ , i.e., $R_{θ} : x \mapsto y$ . A typical solution is to estimate the NN parameter $θ$ that fits the given $D$ through MLE,26^,38 $θ^{*} = \underset{θ}{\arg \max} P (D | θ) = \underset{θ}{\arg \max} P (x, y | θ) = \underset{θ}{\arg \max} P (y | x, θ) P (x | θ) = \underset{θ}{\arg \max} P (y | x, θ) P (x) = \underset{θ}{\arg \max} P (y | x, θ),$ (7)where the likelihood $P (y | x, θ)$ measures the probability of observing the output $y$ in $D$ given the input $x$ and NN parameters $θ$ . A reasonable assumption of $P (y | x, θ)$ is $P (y | x, θ) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(R_{θ} (x) - y)^{2}}{2 σ^{2}})$ , considering that the NN prediction $R_{θ} (x)$ is an estimation of given $y$ . Supposing that the training data $D$ are independent and identically distributed (IID), we have $θ^{*} = \underset{θ}{\arg \max} \prod_{i = 1}^{N} P (y^{i} | x^{i}, θ) = \underset{θ}{\arg \max} \log \sum_{i = 1}^{N} P (y^{i} | x^{i}, θ) = \underset{θ}{\arg \max} \log \sum_{i = 1}^{N} \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(R_{θ} (x^{i}) - y^{i})^{2}}{2 σ^{2}}) = \underset{θ}{\arg \min} \frac{1}{N} \sum_{i = 1}^{N} {(R_{θ} (x^{i}) - y^{i})}^{2},$ (8)where the function $L (θ) = \frac{1}{N} \sum_{i = 1}^{N} {(R_{θ} (x^{i}) - y^{i})}^{2}$ is defined as the loss function that quantifies the discrepancy between the network predictions $R_{θ} (x^{i})$ and the actual outcomes $y^{i}$ across the training dataset $D$ . The way to reduce the value of $L (θ)$ is to optimize $θ$ iteratively by using gradient descent (GD) through $θ^{t + 1} = θ^{t} - η \nabla_{θ} L,$ (9)where $\nabla_{θ} L$ is the gradient of $L$ with respect to the elements in $θ$ , and $η$ denotes the learning rate, which controls the step size during the optimization process. However, the actual calculation is more complex than the formula might initially appear. The complexity arises because the calculation must encompass the entire training dataset, which can be quite extensive. In addition, the parameters in $θ$ within the neural network have a multilayered architecture, which adds to the computational challenge. This means that the optimization of $θ$ to minimize the loss function involves a high-dimensional and often nonconvex optimization problem, which requires sophisticated algorithms and computational resources to solve effectively. The challenges posed by the large-scale training data and the multilayered nature of the parameters are addressed through two key methodologies: (minibatch) stochastic gradient descent (SGD)59 and back propagation (BP).60

In SGD, only a minibatch of the training data is used to calculate the loss function $L$ , i.e., $L = \frac{1}{B} \sum_{m = 1}^{B} {(R_{θ} (x^{m}) - y^{m})}^{2} .$ (10)

The selection of batch size $B$ typically ranges from a single data point to several hundred, and it is usually kept constant even as the dataset size $N$ increases. During each training iteration, a subset of $B$ training data pairs is randomly sampled from $D$ to compute the loss function. Each subset of $B$ training data pairs contains unique data points, and once all training data have been passed through, we complete one training epoch. Generally, training requires dozens to hundreds of epochs to reach satisfactory performance. The SGD method is advantageous as it approximates the expected gradient/loss using only a small batch of data, which reduces the computational demands for deploying DL on large datasets. Popular optimization algorithms, such as Adam, are enhanced versions of SGD, designed to achieve more robust and efficient convergence.59

BP is an effective algorithm for calculating the loss function gradient of each parameter in $θ$ , i.e., $\nabla_{θ} L$ . It achieves this using the chain rule of calculus to propagate the error backward through the layers of the NN, enabling the computation of gradients essential for updating the parameters in SGD. During the forward pass, BP stores intermediate results that are reused in the backward pass to recompute gradients. This approach enhances computational efficiency but does require substantial memory resources. Figure 4 summarizes the underlying mathematical principles of BP. Since its introduction in 1986, BP has become a fundamental and indispensable algorithm in DL.60 It is worth noting that the abbreviation BP can also refer to beam propagation in optics, which describes the propagation of a light field from the measurement plane to the object plane. Although the same abbreviation is used in both contexts, the meaning is typically clear from context and should not cause confusion.

In the 1980s, researchers recognized that deep neural network models could approximate any function,58 and foundational algorithms such as SGD and BP could be used to train these models, leading to a surge of interest in DL. However, in the 1990s, the field experienced a period of stagnation due to several significant challenges. First, the computational resources required to train neural networks were immense, which presented a major barrier given the hardware limitations of the era. Second, multilayer networks encountered the problem of vanishing gradient, making it difficult to train deeper networks and limiting their ability to represent complex features. Finally, the absence of large labeled datasets hindered the development of models with good generalization capabilities. These hurdles prevented the anticipated breakthroughs from materializing, leading to a decline in enthusiasm for DL. Meanwhile, statistical machine learning methods, such as SVM, made considerable advancements, prompting many researchers to shift their focus away from neural networks in favor of these alternative approaches.61

In the 2010s, the contemporary resurgence of DL is largely due to four pivotal breakthroughs. Initially, improvements in computational capabilities, particularly the invention of GPUs, have dramatically expedited DL model training. Following that, the establishment of extensive and well-labeled datasets has provided DL models with rich training resources, which in turn has fostered more robust pattern recognition and generalization ability. Third, innovations in NN architectures, such as convolutional neural networks (CNNs),62 recurrent neural networks (RNNs),63 and the advent of Transformers,64 have empowered DL to process and learn more effectively from intricate, large-scale data. Finally, platforms such as TensorFlow and PyTorch have democratized DL by streamlining the process of implementation and experimentation across various domains. Together, these innovations have led to the rapid evolution of DL in the last 10 years. Figure 5 delineates the essential elements of contemporary DL, which encompass network architecture, neuron functionality, internal links, valuable components, and loss metrics. Fine-tuning these elements for specific applications can significantly amplify the efficacy of DL models.

Figure 5.Key components of modern deep learning. (a) Neural network architectures; (b) individual neurons; (c) different connection ways of neurons; (d) regularization strategies and useful tricks; (e) loss function types.

Download full size

View all figures

4 Deep Learning as an Imaging Solver

In 1985, Hopfield and Tank demonstrated that neural networks could be harnessed to address complex optimization problems.65 These findings greatly influenced the implementation of neural networks in the fields of optical information processing.66 Since the 1980s, seminal studies have illustrated the potential of neural networks in optical computing,67^,68 image restoration,69 computer-generated hologram design,70 phase unwrapping,71 fringe pattern analysis,72 and digital holography.73 However, research in this domain experienced a hiatus from the turn of the millennium primarily due to limitations in computational power, the absence of efficient models, and the scarcity of large datasets required for model training.74

The field took a remarkable turn in 2012 with the advent of AlexNet,75 a DL model that achieved a landmark milestone by outperforming conventional machine learning algorithms in the ImageNet large-scale visual recognition challenge (ILSVRC). This pivotal event thrust DL into the limelight within the realm of artificial intelligence. The momentum continued in 2016 when the DL-powered system AlphaGo76 captivated the world by triumphing over elite human players in the game of Go, amplifying the curiosity and enthusiasm of researchers from various disciplines toward DL technologies.

Advancements in DL have had an immediate and profound impact on CI. In 2015, inspired by Tian and Waller77 and the rapid advancements in DL, Kamilov et al.78 introduced a DL-based algorithm for imaging 3D phase objects within a tomographic configuration. This innovative approach effectively removed missing cone artifacts and eliminated parasitic granular structures, yielding images of higher quality than those of other tomographic reconstruction methods. In 2017, Sinha et al.79 exhibited the merits of DL in improving the speed of lensless imaging systems. Approximately the same time, Rivenson et al.80 leveraged DL to significantly enhance the capabilities of a microscopic imaging setup, demonstrating the prowess of DL in holographic imaging by quickly mitigating twin image and self-interference-related spatial artifacts.81 Furthermore, in 2017, Lyu et al.82 experimentally demonstrated the potential of DL in computational imaging by significantly reducing the number of samplings required in ghost imaging and enabling imaging through optically dense scattering media.83

Building on these advancements, the transformative impact of DL on the field of CI has been profound, paving the way for innovative approaches and significant progress.24^,27^,84 The swift adoption of DL across various branches of CI is a testament to its versatility and effectiveness. In a remarkably short span of time, DL has achieved notable success in areas such as phase recovery,29^,31^,81^,85^–90 ghost imaging,33^,82^,91^–95 microscopy,11^,96^–101 endoscopy,102 imaging through scattering media,83^,103^–109 light-field imaging,110^–112 lensless imaging,79^,113^–116 tomography,78^,117^–122 spectral imaging,32^,123^–125 and optical metrology,30^,126^–129 among many others. These DL-based CI methods, although diverse in purpose and technique, predominantly employ a data-driven DL strategy. The application of data-driven DL in CI has the potential to reduce sampling requirements, accelerate reconstruction processes, enhance imaging quality, and eliminate the need for complex forward modeling. However, challenges remain, including issues related to training data acquisition, network training, generalization, and interpretability. This section will explore how data-driven DL is applied in CI, highlighting its advantages and addressing its limitations.

4.1 Implementations of Data-Driven Learning

DL is inherently flexible in terms of the data types that it can process at both the input and output layers, allowing its broad application across various CI problems. As depicted in Fig. 6(a), one can straightforwardly use the data in the measurement space, denoted as $g^{i}$ , as the input for the DL model and the corresponding ground truth in the object space, $f^{i}$ , as the label. A total of $N$ such paired datasets form the training set $D = {(g^{i}, f^{i}) | i = 1, \dots, N}$ . Subsequently, an NN model $R_{θ}$ can be trained to fit $D$ through the optimization problem $θ^{*} = \underset{θ}{\arg \min} \sum_{i} L (R_{θ} (g^{i}), f^{i}) .$ (11)

Figure 6.Data-driven deep learning for the inverse problem in computational imaging. (a) Training of neural networks using a large amount of paired data in the measurement space and the object space through supervised learning. (b) Using the trained network model to establish the mapping from the image space to object space and provide predictions of the object image for unseen test data.

Download full size

View all figures

Once the training phase is complete, the resulting model $R_{θ^{*}}$ serves as a robust tool for reconstructing the object image from an unseen measurement $g^{t}$ . As shown in Fig. 6(b), by feeding $g^{t}$ into $R_{θ^{*}}$ , one can obtain an estimate of the actual object image $f^{t}$ , that is, $f^{*} = R_{θ^{*}} (g^{t})$ . Typically, the trained model yields accurate results ( $f^{*} \approx f^{t}$ ) because it has internalized implicit prior information from the dataset $D$ , thus effectively addressing the ill-posed nature of the inverse problem. This approach is essentially data-driven, relying exclusively on the training data $D$ to construct the reconstruction model $R_{θ^{*}}$ . Thus, by curating diverse training datasets, DL can be applied to a multitude of inverse problems in CI. The common framework for these data-driven DL-based CI applications can generally be condensed into four steps.

Before model training, raw data must undergo several preprocessing steps to ensure compatibility with neural networks and enhance learning efficiency. Typical preprocessing operations include normalization or standardization of pixel intensities (e.g., scaling to the [0,1] or $[- 1, 1]$ range), denoising to reduce sensor-induced artifacts, resizing to ensure consistent input dimensions, and format conversion to meet the architectural input requirements. These steps are critical for improving numerical stability and facilitating model convergence. To further enhance model robustness and generalization, especially when the dataset is small, data augmentation techniques are widely used. Common strategies include random rotations, flips, cropping, scaling, and intensity perturbations, which effectively simulate variations in experimental conditions and imaging geometries.

Finally, the preprocessed and augmented dataset is typically partitioned into training, validation, and test sets according to a predefined ratio. The training set enables optimization of neural network weights, the validation set is used for hyperparameter tuning, and the test set evaluates the model’s ability to generalize to new, unseen data.

Adherence to these four steps facilitates the completion of various CI tasks utilizing DL techniques. Variations among related studies are the result of different research interests, training data acquisition methods, network architectural choices, and loss function designs. This methodology, which relies solely on large amounts of training data to configure network parameters without explicitly incorporating the physical principles of the imaging system, is commonly referred to as a data-driven DL approach.

4.2 Advantages of Data-Driven Learning

To improve the imaging quality of CI systems, two main approaches are typically considered: (1) increasing the number of measurements to capture more information about the object and (2) introducing priors to allow the reconstruction of additional object information given the available measurements. Traditional iterative algorithms usually integrate prior knowledge through explicitly defined regularization terms or constraints, such as sparsity, smoothness, support, or nonnegativity. However, certain implicit priors, such as “the imaged scene exhibits cellular morphology,” are challenging to describe mathematically.

By contrast, data-driven DL approximates the true data distribution via maximum likelihood estimation, capturing such implicit features directly from the data. For example, facial and cellular data exhibit different distributions, and the specific distributions learned by DL models can be treated as implicit priors that help differentiate facial data from cellular data. Thus, by leveraging implicit priors learned directly from data, which are difficult to express explicitly, DL overcomes the limitations of traditional methods and offers more effective solutions for challenging CI tasks.

Due to its ability to incorporate more prior knowledge, data-driven DL can reduce sampling requirements while maintaining image quality or improve image quality with the same amount of measurement data. Furthermore, due to its training-on-line and testing-off-line nature, DL outperforms traditional iterative algorithms in terms of imaging efficiency. Importantly, DL only requires access to data and does not need detailed knowledge of the imaging system, which makes it advantageous to solve CI problems with complex imaging channels. Figure 7 summarizes the advantages of data-driven DL in reducing sampling requirements, enhancing image quality, accelerating reconstruction processes, and eliminating the need for forward modeling.

Figure 7.Advantages of deep-learning-based computational imaging. Deep learning introduces rich implicit image priors from data through online training and offline testing, allowing computational imaging systems to reduce sampling rates, improve image reconstruction speed, enhance imaging quality, and avoid forward physical modeling. CT, computed tomography; FPP, fringe projection profilometry; SPI, single-pixel imaging; GS, Gerchberg–Saxton; CS, compressive sensing; ART, algebraic reconstruction technique.

Download full size

View all figures

4.2.1 Reduce sampling

In CI, increasing measurement diversity through various modulation modes is a common strategy to mitigate the ill-posedness of inverse reconstruction problems.22^,39 Although effective in reducing uncertainty, this approach can prolong data acquisition times and complicate experimental setups, impacting imaging speed and increasing hardware costs. Data-driven DL enables high-quality images to be restored under limited sampling conditions, significantly reducing data acquisition time and system complexity. For example, in single-pixel imaging (SPI), DL facilitates high-quality image reconstruction even at sampling ratios as low as 5%.33^,82 Similarly, in fringe projection profilometry (FPP), precise 3D reconstructions can be achieved with a single fringe pattern.30^,126 In computed tomography (CT), high-quality images can be generated under limited-angle conditions.121^,136 In the ptychographic iterative engine and Fourier ptychography, DL methods enable high-resolution image synthesis with a lower aperture overlap, significantly reducing the number of low-resolution images required for reconstruction.88^,137^,138

4.2.2 Speed up reconstruction

Traditional inverse reconstruction algorithms in CI incorporate prior information by designing explicit regularization terms to address ill-posed problems.23 These algorithms often require long reconstruction times to iteratively satisfy measurement and prior information constraints, especially for large-scale CI problems. By contrast, DL takes advantage of a multilayer neural network, $R_{θ}$ , which learns from a large amount of data to establish a mapping model between the data, i.e., $R_{θ} : g \to f$ . Although training $R_{θ}$ can be time-consuming, once trained, the model can directly provide predictions through a single inference. This approach offers a significant advantage over traditional iterative optimization algorithms by eliminating the need for iterative computations, substantially enhancing the speed of reconstruction. For example, phase imaging with the Gerchberg–Saxton (GS) algorithm, compressive imaging using compressed sensing (CS) algorithms, and tomographic imaging with the algebraic reconstruction technique (ART) have experienced substantial improvements in reconstruction speed with the adoption of DL.79^,81^,82^,91^,120^,121^,139

4.2.3 Improve image quality

The rich prior information within trained neural network models allows the enhancement of image quality from low-quality inputs, substantially reducing the hardware demands of the imaging system. For example, data-driven DL is utilized in digital staining, where traditional quantitative phase imaging results are processed to emulate fluorescence staining results,140^,141 avoiding the complexities of manual staining and its effects on samples. DL is also prominent in image denoising, with applications in microscopy that yield noise-free, high-quality super-resolution images at low signal-to-noise ratios (SNRs) as a post-processing method.11^,97^,99 This contributes to reduced phototoxicity effects, enabling extended observation of live cells. In addition, research is focused on improving the quality of images from unconventional lenses, using ultra-compact imaging systems to capture low-quality image information that is subsequently enhanced by DL models to match the results of more complex systems.116^,142^–144

4.2.4 Avoid forward modeling

Traditional CI techniques usually require forward physical modeling of the imaging system before inverse reconstruction can be performed.22^,23^,39 However, the presence of random scattering media and atmospheric turbulence along the imaging path makes direct measurement of these uncontrollable disturbances challenging.145 Data-driven DL circumvents this challenge by learning inverse mapping models through neural networks, without the need for analytical forward process modeling. This capability allows for high-quality imaging results even in the presence of complex disturbances, which overcomes challenges in traditional methods. For example, DL has facilitated imaging through optically thick scattering media103^–108^,146^,147 and pushed the boundaries of numerical mode decomposition in multimode fibers.148^–151 DL has also provided novel strategies for clear imaging under turbulent conditions152^–154 and has been used in optical encryption to establish complex mappings between ciphertext and plaintext.155^–157

4.3 Challenges of Data-Driven Learning

Although DL has been extensively applied in CI, several challenges remain. These include difficulties in acquiring sufficient training data, generalization issues, the lack of interpretability in DL models, and the substantial computational resources required for training, as highlighted in Fig. 8.

Figure 8.Challenges of deep-learning-based computational imaging. Deep learning requires large amounts of data to train a multi-layer neural network, resulting in difficulties in (a) acquiring training data, (b) high computational complexity, (c) poor generalization, and (d) low interpretability.

Download full size

View all figures

4.3.1 Training data acquisition

Data-driven DL-based CI has become a standard paradigm, with the methodology of collecting training data pairs being a primary differentiator between applications. Unlike DL-based object recognition, which relies on manual labeling, DL-based CI requires the continuous change of objects within the imaging system to gather diverse data. This process presents significant challenges, particularly in fields such as medical diagnosis144 and astronomy,158 where creating a diverse dataset is challenging. Furthermore, obtaining high-quality ground truth images often requires extensive sampling, complex imaging systems, and lengthy computational processing. The distribution of acquired data is also closely tied to specific imaging systems, necessitating data re-collection when system parameters change, further complicating the process.

4.3.2 Generalization

Most traditional optimization algorithms in CI rely on general image prior assumptions, such as smoothness or sparsity, to constrain the optimization process and guide it toward a feasible solution. These priors can be controlled via a tunable regularization parameter, making conventional optimization methods flexible and applicable across a wide range of object types and system configurations. However, data-driven DL-based CI requires training on a specific dataset, which means that trained models may exhibit a bias toward the training data. As a result, when tasked with reconstructing objects that differ significantly from the training set, these models may produce unrealistic or impractical results. Ideally, a well-trained model can effectively reconstruct the data from the same distribution. However, when exposed to data from different distributions—such as varied object types or imaging system configurations—the model’s ability to generalize may be compromised, leading to a deterioration in reconstruction quality. Conventional methods typically tackle the generalization issue through statistical distribution matching, which includes curating large representative training datasets,104^,108^,131^,159^,160 designing adaptable network architectures,146^,161 and utilizing more effective loss functions to improve model robustness.162 However, the no-free lunch theorem163 suggests that a universal optimization strategy is theoretically impossible, implying that the best reconstruction algorithm is specific to a particular problem. In practical imaging scenarios, the diversity of targets and system configurations often exceeds the scope of training data, limiting the real-world applications of data-driven DL-based CI methods.

4.3.3 Interpretability

Despite breakthroughs in various fields and a clear understanding of the internal structure and training strategies of neural networks, there is no strict theoretical guidance on designing neural network architectures or training better models. The trained model is often seen as a black box that produces good results but is challenging to comprehend. Existing DL techniques typically require trial-and-error optimization of neural network hyperparameters, which, due to the vast range of choices, can be computationally and temporally costly. In many cases, although a trained model may yield satisfactory results, it does not enhance our understanding of the underlying problem. The lack of interpretability can raise concerns regarding the fidelity of images obtained through DL-based CI, especially in fields where interpretability is critical, such as scientific discovery and medical diagnosis.164^,165

4.3.4 Computing requirements

In optical imaging, the increasing number of pixels on image sensors for better visual effects, along with the need to capture more information, such as depth and spectrum, increases the amount of data collected and reconstructed. This poses challenges for data processing, particularly when using DL-based methods that require extensive computation during training. A recent study has shown that the demand for DL computing power is approaching the limit of existing chips.166 With the end of Moore’s law,167 the advancement of computing power may not keep pace with demand. Moreover, the limitations of DL in generalization and interpretability may further increase computational requirements because different models may need to be trained on various datasets for different situations or repeatedly trained on the same dataset with different hyperparameters to improve performance.38

5 Physics-Enhanced Deep Learning for Computational Imaging

Given the challenges associated with traditional data-driven DL in CI, there has been a notable change in recent research. An increasing number of studies are exploring the integration of physical knowledge of CI systems with DL methodologies.168^–170 This emerging approach, termed physics-enhanced DL or physics-informed DL, combines physics priors with training data, neural network architectures, and loss functions. The aim is to provide innovative solutions to the challenges of data acquisition, computational demand, generalization, and interpretability that are inherent in data-driven DL.

Figure 9 offers a comprehensive summary of the various types of physics-enhanced DL-based CI methods that will be detailed in this paper.

Figure 9.Categorizations of physics-enhanced deep-learning-based computational imaging. The integration of physics priors, consisting of the image formation model and the inverse restoration criterion, with deep learning is reflected in the three fundamental elements of deep learning: data, network, and loss function.

Download full size

View all figures

5.1 Physics Prior

In contrast to traditional DL applications such as face recognition and image segmentation, where the relationship between inputs and labels is human-defined and lacks physical context, CI offers a different scenario. In CI, the inputs and labels are directly linked to the measurements and objects within the imaging system. This link can be established through the use of physics priors.

In CI, physics priors are primarily composed of two elements: the knowledge required to construct the forward physical model $H$ and the criteria to employ $H$ in inverse reconstruction. First, the forward model represents the mathematical formulation of the imaging system, delineating the relationship between the object $f$ and its corresponding measurement $g$ , i.e., $H : f \to g$ . The construction of the forward model demands a thorough understanding of the components of the CI system, such as the light source, transfer function, and detector sampling. For example, in phase imaging, the forward model is based on diffraction theory, taking into account parameters such as wavelength, diffraction distance, and pixel size.171 In ghost imaging, the model is derived from the actual speckle pattern on the object’s surface.172 Similarly, in microscopy, the model is constructed considering the illumination mode and the numerical aperture of the objective lens.173

Second, the inverse reconstruction criterion dictates the process of recovering the target object $f$ from the measurement signal $g$ . Inverse reconstruction typically employs analytical or iterative methods, based on a deep understanding of the imaging system and the synergistic use of the measurement signal $g$ with the forward model $H$ . For example, in phase imaging, phase information is retrieved from the diffraction intensity pattern through an alternating projection algorithm.174 In ghost imaging, the object image is derived from the correlation between intensity fluctuations and speckle patterns.172 In microscopy, high-resolution images are achieved through deconvolution, using low-resolution images and the point spread function of the system.57

These aspects of physics prior have been integral to the evolution of CI and continue to be refined.23^,39 Although data-driven DL is potent, ignoring these physics priors is not advisable. Thus, integrating DL with explicit physics priors is essential to harness their combined strengths.

Incorporation of physics priors into DL has attracted significant interest, with an increasing number of studies focusing on physics-enhanced DL approaches.90^,170 As DL encompasses key components such as data, networks, and loss functions, the integration of physics priors with DL is manifested in these areas. We will now explore each of these aspects in detail.

5.2 Incorporating Physics in the Data

The integration of physics prior with data predominantly involves employing the forward physical model to generate input data for neural networks. This can be achieved through learning from simulations [Fig. 10(a)], end-to-end optical design [Fig. 10(b)], or by applying the forward physical model to refine network outputs in a plug-and-play (PnP) fashion [Fig. 10(c)].

Figure 10.Integration of physics priors with input–output data. (a) Generation of training data using a fixed physical model; (b) generation of training data using a parameterized physical model, where the parameters of the physical model are optimized along with the weights of the neural network during training; (c) utilization of the physical model to process the output of a pretrained denoising network.

Download full size

View all figures

5.2.1 Learning from simulation

The most direct application of physics priors in DL-based CI is to generate synthetic training data using the physical model. With numerous datasets readily available, it is expedient to adopt images from these datasets as the imaging targets $D = {f^{(k)}}_{k = 1}^{K}$ , which effectively serve as labels. Subsequently, the forward physical model $H$ of a given CI system can be used to calculate the corresponding synthetic measurements ${g^{(k)}}_{k = 1}^{K} = H ({f^{(k)}}_{k = 1}^{K})$ . This allows for self-supervised training of the neural network, $θ^{*} = \underset{θ}{\arg \min} L (R_{θ} (H (f^{k})), f^{k}), \forall f^{k} \in D,$ (12)which eliminates the need for acquiring experimental training data.

This approach, although simple, has been widely applied in CI (see Appendix for some examples). However, it requires two critical prerequisites. First, the forward operator $H$ of the imaging system in use must be well-defined to ensure the use of simulated data in place of experimental data. The imaging model typically approximates the real image formation process; a rigorous numerical calibration of $H$ that accounts for factors such as camera noise, lens aberration, and equipment error is essential to mitigate this limitation.89^,121^,175^,176^,290^–292 However, due to the domain shift issue, models trained on a substantial amount of real data often perform better in experimental settings than those trained on simulated data alone.293

Second, the imaging targets intended for simulation should closely resemble real objects. As previously discussed in Sec. 4.3, models trained on specific datasets tend to be biased toward data of a similar distribution. Therefore, having prior knowledge about the real object’s structure is crucial for selecting appropriate images from a dataset for numerical simulations. It should be highlighted that in scenarios where an accurate forward model cannot be constructed or the imaging targets’ content is unknown, learning from simulation still offers a viable strategy for parameter initialization. This can be achieved through pretraining on simulated data followed by fine-tuning on experimental data. Such a transfer learning approach effectively reduces the dependency on the quantity of experimental training data.294

5.2.2 End-to-end optical design

Unlike learning from simulation, which utilizes a fixed forward physical model, end-to-end optical design integrates adjustable parameters in the forward model as trainable weights, alongside those of the image reconstruction network. These weights are concurrently optimized, as articulated by the objective function, $θ^{*}, H^{*} = \underset{θ, H}{\arg \min} L (R_{θ} (H (f^{k})), f^{k}), \forall f^{k} \in D .$ (13)

This strategy can also be construed as a hybrid opto-electronic AutoEncoder framework, where the encoding layer represents the CI system’s forward physical model $H$ , the bottleneck layer processes the raw measurements, and the decoding layer is the image reconstruction neural network $R_{θ}$ . This method has gained significant attention as it promotes adaptive parameterization of CI systems to enhance imaging performance on specific datasets,295 and it has also been widely discussed in broader photonics design.296 Owing to the limited interpretability inherent in neural networks, this approach often surpasses traditional, manually designed encoding schemes. As a result, end-to-end optical design has been extensively applied across various fields, including computational photography, ghost imaging, phase imaging, and compressive spectral imaging91^,177^–182^,297 (see Appendix).

A key consideration in this approach is the design of the encoding layer, which must not only be grounded in a specific forward physical model but also account for the feasibility of implementing the optimized encoding scheme. CI systems are often constrained by the capabilities of their optical components and cannot accommodate arbitrary encoding modes. Two primary strategies address this challenge: first, by applying differentiable constraints during optimization to ensure that the encoding scheme remains within the feasible range of the system’s optical components, and second, by performing unconstrained optimization and then adjusting the optimized encoding scheme to a form compatible with the system’s components, followed by fine-tuning the decoding layer’s parameters while keeping the encoding layer fixed.

5.2.3 Plug-and-play

Plug-and-play (PnP) algorithms are a class of iterative methods that substitute the proximal operator of the regularizer in the alternating direction method of multipliers (ADMM) or similar proximal algorithms with a more flexible denoiser.298 With the recent progress in DL-based denoising techniques, PnP algorithms have increasingly incorporated pretrained denoising neural networks into their denoising steps. As illustrated in Fig. 10(c), the use of a physical model to refine the output of the neural network in each iteration positions PnP algorithms that employ learned denoisers at the intersection of physics-based priors and data-driven approaches.

The PnP framework allows for the combination of diverse physical models and denoisers, making it versatile for a range of image reconstruction challenges. It has demonstrated significant success on various CI tasks,299 as highlighted in Fig. 18 (see the Appendix). The outputs of the denoising network are constrained by the physical model, which endows PnP algorithms with superior generalization capabilities compared to purely data-driven DL methods.183^,184 Nonetheless, although some studies have made strides in accelerating PnP algorithms,185^,300 their iterative nature (requiring the use of a pretrained denoising model in each cycle) still poses challenges for real-time imaging applications. In addition, the imaging quality is contingent upon the denoising network’s ability to generalize. Consequently, the development of lightweight and broadly effective denoising models stands as a pivotal area of research for advancing PnP algorithms.299

5.3 Incorporating Physics in the Network

In DL, the choice of the network architecture for feature extraction is crucial to optimize practical performance. Various neural network architectures are tailored to harness the unique characteristics of different applications, leading to ongoing enhancements in DL’s capacity to address specific challenges. A prominent trend in DL-based CI is the integration of physics priors into the construction of neural network models. This strategy includes the explicit incorporation of forward physical models or inverse reconstruction criteria in the design of hidden layers, encompassing methods such as diffractive deep neural networks, interpretable physical decoding layers, and techniques such as unrolling or unfolding.

5.3.1 Diffractive deep neural networks

As illustrated in Fig. 11(a), diffractive deep neural networks ( $D^{2} NN s$ ) consist of stacked layers of diffractive optical elements similar to an artificial neural network that can be trained to perform complex functions at the speed of light. It has been widely used in image recognition tasks by mapping different light intensity distributions to spatial positions that represent different categories in the camera.186 Recently, it has also been applied to CI tasks, including phase retrieval,187^–189 imaging through scattering media,190 and image super-resolution191 (see Appendix). This approach exploits a pretrained $D^{2} NN$ for decoding during the image formation process, directly capturing object information with a camera. This results in significant advancements in computer-free CI with ultra-high inference speed and ultra-low power consumption, leading to non-von-Neumann neuromorphic computing.

$Construction of neural networks using physics priors. (a) Construction of a diffraction neural network using a diffraction propagation model; (b) addition of physically meaningful feature extraction layers to traditional neural network architectures; (c) unfolding physics-driven iterative optimization algorithms into neural networks, where each layer of the network involves computation using the physical model.$

Figure 11.Construction of neural networks using physics priors. (a) Construction of a diffraction neural network using a diffraction propagation model; (b) addition of physically meaningful feature extraction layers to traditional neural network architectures; (c) unfolding physics-driven iterative optimization algorithms into neural networks, where each layer of the network involves computation using the physical model.

Download full size

View all figures

However, there remains a gap between the practical imaging results achieved with $D^{2} NN$ and those obtained using traditional neural network models. This discrepancy stems from limitations in integration density inherent to the all-optical construction of the network. In addition, some operations essential for building neural network models, such as nonlinear activation, batch normalization, and residual connections, are easily managed electrically but challenging to implement optically.301^,302 Furthermore, as the final results are directly obtained from a camera, the practical application of this approach is also limited by the camera’s performance.303 Notably, the research on using optical mechanisms to build optical neural networks has grown significantly in recent years. Many architectural concepts and implementations, including different neuron models, training techniques, and topologies, are being explored. Interested readers can refer to several excellent reviews for more detailed information.303^–305

5.3.2 Interpretable layers

The hidden layers of neural networks are often perceived as a black box, a perception exacerbated by the scarcity of interpretability in their structural design. Embedding interpretable layers such as conventional restore algorithms and Fourier-inspired analysis within the architecture of neural networks can potentially enhance the network’s capacity to learn from training data more effectively. As illustrated in Fig. 11(b), the approach of interpretable layers primarily encompasses the preprocessing of network inputs, the integration of physical principles within network modules, and the application of constraints before network output. Preprocessing with physics priors can adapt diverse measured signals into formats or features amenable to neural network analysis.82^,107^,109^,175^,192^,292^,306 The incorporation of physical principles in hidden layers facilitates more efficient feature representation.193^,194^,307 Imposing physical constraints on the network’s output ensures compliance with specific physical laws.126^,308^,309 Compared with conventional data-driven DL methodologies, the integration of interpretable layers can bolster the quality of the reconstructed image and enhance the generalizability of the network.193^,195 This approach also increases the interpretability of the network architecture. However, it is important to recognize that interpretable layers do not automatically render the entire network interpretable, as traditional neural network components persist. The efficacy of integrating interpretable layers must be empirically validated. Nonetheless, DL methods augmented with interpretable layers have been successfully applied to many CI tasks, yielding commendable outcomes (see Appendix).

5.3.3 Unrolling/Unfolding

This technique entails the utilization of a traditional model-based optimization step as a layer within the network. After $N$ iterations, the resulting unrolled network comprises $N$ layers.196^,197 Within each layer, any differentiable parameters, inclusive of those associated with model discrepancies, calibration inaccuracies, and regularization factors, can be refined through backpropagation. This capability enables the network to autonomously discern optimal parameters, a task that conventional optimization algorithms typically rely on preset values to accomplish. The approach allows a modest number of iterations (layers) to produce a robust estimation, thereby facilitating real-time reconstruction.113 In addition, this innovative training paradigm significantly diminishes the quantity of trainable parameters compared with standard neural networks. Consequently, a reduced dataset is required for training, reducing the cost of implementing learning-based methodologies.113^,181 Furthermore, as the unrolled network is constructed from knowledge of the imaging system and prior information about the object, the resulting network, grounded in physical principles, is more interpretable. Figure 18 presents examples of CI tasks that have capitalized on the unrolling/unfolding technique.

5.4 Incorporating Physics in the Loss Function

Conventional data-driven DL-based CI typically relies on the image domain mismatch—the discrepancy between the restored and ground truth images—to guide the optimization of neural network weights. Recently, research has shown that the measurement domain mismatch—the error between the reproduced and raw measurements—can also be used to guide the optimization of NN weights.90^,169 This innovative strategy encompasses various implementations, including untrained, physics-driven fine-tuning, generative prior, unsupervised learning, and consistency regularization methods (see Fig. 12).

Figure 12.Using physics models for computing physics-consistency loss functions (PCLFs). (a) Optimizing parameters of a randomly initialized neural network using PCLF with current measurements; (b) fine-tuning a pretrained image reconstruction model using PCLF with current measurements; (c) optimizing the sampling vector of a pretrained generative model using PCLF with current measurements; (d) training a neural network using PCLF defined by measurements corresponding to multiple objects; (e) using PCLF defined by input measurements as a regularization term in traditional supervised deep learning.

Download full size

View all figures

5.4.1 Untrained neural networks

Although conventional data-driven DL uses NNs to learn implicit priors from training data, Ulyanov et al.310 demonstrated that a randomly initialized NN tends to generate natural images, suggesting that the NN architecture can serve as a crafted prior, i.e., the deep image prior (DIP). Building on this insight, Wang et al.90 integrated a physical model with an untrained NN to tackle CI inverse problems. The objective function for a physics-enhanced deep neural network, termed PhysenNet,90 is given by $θ^{*} = \underset{θ}{\arg \min} L (H (R_{θ} (g)), g) .$ (14)

PhysenNet relies solely on physical knowledge $H$ and the raw measurements $g$ , indicating that it does not require a training set. This untrained approach circumvents biases toward specific distributions, mitigating generalization issues inherent in data-driven DL. Moreover, the NN output adheres to physical model constraints, enhancing interpretability. PhysenNet represents a model-driven optimization algorithm but differs from conventional ones in two key aspects. First, it does not directly update the target image but instead updates the NN weights $θ$ considering the output $R_{θ} (g)$ as the estimated image. Second, it addresses ill-posed inverse problems without explicit regularization, leveraging an untrained NN to introduce implicit image priors, which have shown superiority over traditional handcrafted priors such as sparsity and smoothness.94^,311 The robust implicit prior from a well-designed NN endows PhysenNet with the potential for superior performance on CI inverse problems.90^,198 In addition, it reduces the need for detailed imaging system modeling when $H$ is incorporated into the optimization objective.199^,200

Since 2020, the untrained approach has been extensively applied to CI tasks (see examples in Fig. 18). However, it encounters three notable challenges. First, the iterative process can be time-consuming, requiring updating all NN weights in each iteration. Second, there is a risk of overfitting the noise in measurements, necessitating an early stop at an appropriate iteration. Third, designing the architecture of the untrained NN remains an unresolved issue.

Methods such as deep decoders and neural fields (implicit neural representation, INR) have been proposed to address these challenges. The deep decoder approach replaces the over-parameterized network used in the original DIP method with an under-parameterized image-generating network, eliminating the need for early stopping or additional regularization.199^,312 The emerging INR approach parameterizes the object using a coordinate-based deep network, representing the object through the parameters of a small-scale NN instead of dense voxel or pixel grids.43^,313^,314 This approach signifies a breakthrough from discrete to continuous representation and allows for regional reconstruction, making it suitable for large-scale imaging tasks.201^–203^,315

5.4.2 Physics-driven fine-tuning

Although the untrained method forgoes the use of training data, it may not fully leverage the implicit prior knowledge inherent in available datasets. To enhance the performance of inverse problem-solving, algorithms can be designed to harness both explicit physics priors and implicit priors from the data at hand. The physics-driven fine-tuning approach addresses inverse problems with a two-step methodology: data-driven pretraining followed by model-driven fine-tuning. During the data-driven pretraining phase, network parameters are optimized on a given dataset. Subsequently, in the model-driven fine-tuning phase, the pretrained network parameters are refined using a physics-consistency loss function during the testing phase, as $θ_{p}^{*} = \underset{θ_{p}}{\arg \min} L (H (R_{θ_{p}} (g)), g) .$ (15)

In this equation, the weights in $R_{θ_{p}}$ are initialized by a pretrained model. It is important to note that the objective function in Eq. (15) mirrors the loss function of the “untrained” method. The distinction lies in the initialization of the network parameters, which are not randomly assigned but rather pretrained. This simple methodology offers several advantages. First, the pretrained network embodies prior information about the data distribution, and the fine-tuning process aligns the network’s output with the physical model’s constraints, effectively leveraging both physical and data-driven priors to tackle ill-posed inverse problems. Second, when the testing data are distributed similarly to the training data, an appropriately pretrained model can yield satisfactory results without additional fine-tuning. By contrast, when the testing data deviate from the training data distribution, fine-tuning mitigates reconstruction artifacts due to generalization challenges, thus balancing accuracy and efficiency. Third, commencing network optimization from a more advantageous starting point results in more stable and rapid convergence.204

This approach parallels the principles of transfer learning.294 However, it distinctively addresses domain shift challenges with a physics-driven strategy, eliminating the need for target domain training data during fine-tuning. In addition, physical models significantly influence not only the fine-tuning process but also the data-driven pretraining phase, where they inform data generation, network architecture construction, and loss function formulation, thereby yielding superior pretrained models. This methodology was first demonstrated in single-pixel imaging95 and coherent diffraction imaging.205 It has also been extended to other CI tasks, including speckle correlation imaging,206 phase unwrapping,207 and phase imaging204^,208 (see Appendix).

5.4.3 Generative prior

The physics-driven fine-tuning approach, although effective for various CI applications and associated inverse problems, necessitates training a dedicated model $R_{θ^{*}}$ for each task or system configuration, thereby limiting its adaptability. To address this, a novel method has been proposed that incorporates priors from both data and physics using a generative model.316 This technique involves optimizing the latent vector $z$ of a pretrained image generative model to minimize the loss function, $z^{*} = \underset{z}{\arg \min} L (H (G_{θ} (z)), g),$ (16)where $z$ denotes the latent vector and $G_{θ} (\cdot)$ is the generative model. By adjusting $z$ , various images $G_{θ} (z)$ can be sampled from the learned distribution. Thus, the meaning of Eq. (16) is to select an image that, when processed by the physical model $H$ , best reproduces the raw measurements $g$ . After optimization, the optimized latent vector $z^{*}$ is used to generate the final estimate $f^{*} = G (z^{*})$ from the generative model. This process effectively seeks to minimize data discrepancy by selecting an image from a predefined distribution, leveraging the generative model as a natural constraint for addressing ill-posed inverse problems.

The generative prior algorithm offers a straightforward means of harnessing both data and physics priors, thanks to the task-agnostic nature of generative models. This eliminates the need for task-specific model training as seen in the physics-driven fine-tuning approach. With the rapid advancements in artificial intelligence generative contents (AIGCs), high-fidelity 2D or 3D image priors can be readily integrated using sophisticated AIGC models.317 Even in scenarios where the object image cannot be generated by merely tuning the sampling vector $z$ , the generative model’s weights $θ$ can be adjusted to expand the search space.209^,318^,319 It is worth noting that GANs are not the only option for introducing generative priors. Any generative model capable of representing data distributions, such as variational auto-encoders320 and diffusion models,321 can also be used. Furthermore, when the pretrained weights of the generative model are not utilized and the input sampling vector $z$ is held constant, this approach aligns with the untrained method. This concept has been successfully applied in compressive sensing,210^,319^,322^,323 single-pixel imaging,209^,211 phase retrieval,212^,324^,325 Fourier ptychography,213 medical imaging,214^,326 image inpainting,322^,327 and image deblurring.215

It is important to note that although the generative prior method, such as physics-driven fine-tuning, utilizes a pretrained model and constrains the image refinement process through physics-driven mechanisms, the initial results from the generative prior method do not directly correspond to the underlying target. Consequently, achieving real-time image reconstruction remains a significant challenge.

5.4.4 Self-supervised learning

The previously discussed approaches leverage the physics consistency loss for a single test object to guide the NN weight optimization. This implies that after optimization, the NN model is tailored to provide good results for a specific test image only. However, a recent study has shown that physics consistency can also serve as a basis for training an NN model on a large dataset through the following equation: $θ^{*} = \underset{θ}{\arg \min} L (H (R_{θ} (g^{k})), g^{k}), \forall g^{k} \in S_{T}^{'} .$ (17)

Compared with Eq. (7), the labeled data $f^{k}$ is absent from the objective function, and the training set comprises solely raw measurements $g^{k}$ . Here, alongside the minimization of the objective function, the NN output $R_{θ} (g^{k})$ can be trained to generate estimated measurements $H (R_{θ} (g^{k}))$ that closely resemble the corresponding raw measurements $g^{k}$ . However, as mentioned in Sec. 2, the inverse problem in CI is typically ill-posed, allowing multiple solutions, including infeasible ones, to minimize Eq. (17). The aforementioned strategy using physics-consistency loss taps into implicit image priors from network architecture310 or pretrained NN models95 to tackle the ill-posedness of inverse problems in CI. Here, for self-supervised learning, due to the lack of constraints, the trained model using Eq. (7) may not yield satisfactory results. Nonetheless, this concept has made significant strides in the field of computer-generated holography (CGH).216^,217^,328^,329 As CGH aims to design holograms for perfectly displaying known objects, multiple solutions are acceptable as long as the loss function is minimized. Therefore, the physics-driven self-supervised learning approach can offer a rapid (noniterative) and cost-effective (no need for preparing paired training data) method for hologram design. This approach has even been applied to projecting images through multimode fibers, where the forward physical model is established by a trained neural network.218 In a study on extracting particle size distribution from laser speckles, a trained network was similarly used to model the forward physical process, which was then employed to guide the self-supervised training process.330

It is important to note that this concept has also been applied in CI. For instance, in 2022, Li et al.219 introduced a self-supervised physics-driven approach to train an NN for restoring amplitude and phase information from two diffraction patterns. They demonstrated that even with self-supervised training, the NN could yield promising results. Similarly, Huang et al.220 proposed a comparable concept for phase imaging, conducting further comparisons between supervised and self-supervised training. They showed that the physics consistency loss used in self-supervised learning surpasses traditional structural loss functions often used in supervised learning. These conventional loss functions can lead to overfitting specific image features in the training dataset, causing generalization errors, especially for new types or classes of samples not encountered before.

5.4.5 Consistency regularization

The methods previously outlined adjust network parameters directly using a physics-consistency loss function. The consistency regularization approach, however, integrates data-driven loss functions with physics-consistency loss functions, as illustrated in Fig. 12(e). The objective function of this method is given by $θ^{*} = \underset{θ}{\arg \min} L ((R_{θ} (g^{k})), f^{k}) + λ L (H (R_{θ} (g^{t})), g^{t}), \forall {g^{k}, f^{k}}_{k = 1}^{P} \in D, \forall {g^{t}}_{t = 1}^{Q} \in S_{T}^{'} .$ (18)

Here, the physics consistency loss can be viewed as a regularization term, ensuring that the network parameters are fitted to the given label data while adhering to the constraints imposed by the physical model. This method, with regularization, has been successfully applied to imaging through scattering media,221^,222 optical diffraction tomography,331 computed tomography (CT),223^,224 magnetic resonance imaging (MRI),225^,226 and spectrum imaging,227 yielding results that exceed those of traditional data-driven algorithms (without physics consistency loss term), especially in scenarios that sufficient training dataset is not available.

It is important to note that by adjusting the ratio between the measurements $g$ and the labels $f$ , the consistency regularization method can adapt to different learning paradigms, including supervised learning, semi-supervised learning, and self-supervised learning. In supervised learning, labeled data pairs are utilized to assess the data-driven loss, and the measurements within these pairs are used to assess the physics-consistency loss, that is, where $P = Q$ . In semi-supervised learning scenarios, there are a limited number of labeled data but a larger amount of unlabeled measurement data, indicated by $P < Q$ . In self-supervised learning, only measurement data are present, denoted by $P = 0$ and $Q > 0$ .

5.5 Comparison of Different Implementations

This section offers a comparative analysis of various implementations that integrate physical models with DL frameworks in CI. As detailed in Fig. 18, we highlight representative applications of these implementations. We found that although existing physics-enhanced DL methods can be categorized into as many as 11 distinct approaches, they can fundamentally be grouped into two broad categories, as illustrated in Fig. 13: (1) DL methods incorporating physics priors and (2) optimization algorithms enhanced by DL techniques. Approaches such as learning from simulation, end-to-end optical design, diffractive neural networks, interpretable layers, unrolling/unfolding, self-supervised learning, and consistency regularization primarily leverage the physical models of CI systems to effectively train neural networks for inverse reconstruction tasks. Methods such as untrained, plug-and-play, physics-driven fine-tuning, and generative priors primarily employ DL techniques as constraints to guide or refine the results of model-based iterative optimization processes.

Figure 13.Comparison of typical physics-enhanced inverse reconstruction algorithms. (a) Deep learning methods incorporating physics priors. (b) Optimization algorithms enhanced by deep learning techniques.

Download full size

View all figures

The fundamental differences between these two types of methods result in distinct characteristics. DL methods incorporating physics priors have the advantage of fast reconstruction speed. However, they require access to training data and still face challenges such as poor generalization, limited interpretability, and the high computational cost of network training. By contrast, optimization algorithms enhanced by DL techniques do not require large amounts of training data and exhibit strong scalability and interpretability. Nonetheless, they suffer from the drawback of being time-consuming due to the iterative nature of the optimization process. In Table 2, we summarize the advantages and disadvantages of various DL-based methods in CI discussed in this review.

Table 2. Advantages and disadvantages of various DL-based methods in CI.

View table

View all Tables

Table 2. Advantages and disadvantages of various DL-based methods in CI.


Method	Advantages (A) and disadvantages (D)
Data-driven DL	A: Fast, no need for physical modeling
	D: Requiring sufficient training data, poor generalization, poor interpretability, time-consuming training process
Learning from simulation	A: Fast, easy to generate training data
	D: Reliance on accurate physical modeling, poor generalization, poor interpretability, time-consuming training process
End-to-end optical design	A: Adaptive system design with differentiable optics, fast, easy to generate training data
	D: Reliance on accurate physical modeling, poor generalization, poor interpretability, time-consuming training process
Plug-and-play	A: Strong scalability, capability of using existing denoising networks without retraining
D: Slow reconstruction, reliance on accurate physical modeling
Diffractive neural network	A: High speed, low power consumption
D: Poor image quality, poor generalization, low flexibility in network architecture design
Interpretable layers	A: Fast, improving image quality, generalization, and interpretability
D: Reliance on accurate physical modeling, requires sufficient training data, time-consuming training process
Unrolling/unfolding	A: Fast, reducing training data requirements, enhancing generalization and interpretability
D: Reliance on accurate physical modeling, still a data-driven approach
Untrained	A: No need for training data, no generalization issues, good interpretability, suitable for large-scale imaging (with INR)
D: Slow reconstruction, reliance on accurate physical modeling
Physics-driven fine-tuning	A: Fast, extendable to out-of-domain data, no downstream data required, good interpretability
D: Time-consuming fine-tuning process, reliance on accurate physical modeling
Generative prior	A: Strong scalability, can use various generative models
D: Time-consuming fine-tuning process, reliance on accurate physical modeling
Self-supervised learning	A: No need for ground truth data, fast
D: Reliance on accurate physical modeling, poor generalization, poor interpretability, time-consuming training process
Consistency regularization	A: Fast, reducing training data requirements
D: Reliance on accurate physical modeling, limited generalization, limited interpretability, time-consuming training process

We have additionally provided an example-driven narrative to illustrate how these physics-enhanced DL methods improve the real-world applications of CI tasks. As shown in Fig. 14, we first present some application examples of DL methods that incorporate physics priors. For instance, as depicted in Fig. 14(a), networks trained with data synthesized using an image formation model can enhance the resolution of real microscopic images compared with traditional deconvolution methods.97 By integrating metasurface structures as part of the neural network parameters and jointly optimizing them with the subsequent image deconvolution reconstruction network [Fig. 14(b)], Nano-Optics 550,000 times smaller than conventional Compound Optics can achieve comparable imaging quality.142 In Fig. 14(c), trained passive diffractive neural networks can handle holographic reconstruction at the speed of light, directly capturing twin-free on-axis holographic imaging results without any subsequent computational processing.187 Figure 14(d) demonstrates that speckle preprocessing based on optical memory effects can eliminate the differences in speckle patterns of the same object under different scattering media, enabling neural networks to reconstruct object images in various scattering environments.109 In Fig. 14(e), a neural network unfolded using the ADMM algorithm achieves imaging speeds 20× faster than traditional model-driven methods, enabling interactive scene previews using a mask-based lensless imager.113 Figure 14(f) shows that by directly guiding neural network parameter optimization with a physics-consistency loss function, without any labeled training data, it is possible to reconstruct an object’s amplitude and phase from two or more diffraction patterns.220 Figure 14(g) illustrates how combining data-driven and physics-driven loss functions allows them to resolve unseen spectra with multitone wavelengths, exhibiting better robustness in long-term measurements compared with data-driven networks.227 It is evident that although specific methods vary across CI tasks, these approaches ultimately result in trained neural network models to solve CI inverse problems. Compared with traditional data-driven DL, the incorporation of physics prior alleviates issues in data acquisition, generalization, and interpretability while also enhancing imaging performance.

$Examples of deep-learning-based methods incorporating physics priors. (a) Resolution enhancement using a neural network model trained on simulation data, reproduced with permission from Ref. 97 © 2018 Springer Nature. (b) High-quality, full-color, wide FOV imaging using end-to-end designed nano-optics, reproduced with permission from Ref. 142 (CC-BY). (c) Holographic reconstruction using a diffractive neural network, reproduced with permission from Ref. 187 © 2021 American Chemical Society. (d) Deep learning for speckle imaging with interpretable speckle-correlation for preprocessing, reproduced with permission from Ref. 109 © 2021 Chinese Laser Press. (e) Unrolling a model-based optimization algorithm for lensless imaging, reproduced with permission from Ref. 113 © 2020 Optical Society of America. (f) Holographic reconstruction with a neural network model trained by self-supervised learning, reproduced with permission from Ref. 220 (CC-BY). (g) Spectrum analyzer using physical-model and data-driven model combined neural network, reproduced with permission from Ref. 227 © 2023 Wiley-VCH GmbH.$

Figure 14.Examples of deep-learning-based methods incorporating physics priors. (a) Resolution enhancement using a neural network model trained on simulation data, reproduced with permission from Ref. 97 © 2018 Springer Nature. (b) High-quality, full-color, wide FOV imaging using end-to-end designed nano-optics, reproduced with permission from Ref. 142 (CC-BY). (c) Holographic reconstruction using a diffractive neural network, reproduced with permission from Ref. 187 © 2021 American Chemical Society. (d) Deep learning for speckle imaging with interpretable speckle-correlation for preprocessing, reproduced with permission from Ref. 109 © 2021 Chinese Laser Press. (e) Unrolling a model-based optimization algorithm for lensless imaging, reproduced with permission from Ref. 113 © 2020 Optical Society of America. (f) Holographic reconstruction with a neural network model trained by self-supervised learning, reproduced with permission from Ref. 220 (CC-BY). (g) Spectrum analyzer using physical-model and data-driven model combined neural network, reproduced with permission from Ref. 227 © 2023 Wiley-VCH GmbH.

Download full size

View all figures

We also present some application examples of optimization algorithms enhanced by DL techniques. In Fig. 15(a), Wang et al.90 used a physics-driven loss to optimize the parameters of an untrained neural network, ensuring that the network outputs satisfy physical model constraints and accurately reflect the phenomenon of wave propagation in free space. This allowed phase-type object images to be reconstructed from a single diffraction intensity pattern without using any training data. In Fig. 15(b), Chang et al.184 employed a plug-and-play method to tackle large-scale phase retrieval problems, using an alternating projection solver and an enhancing neural network (denoising model, FFDNET161) to address measurement formation and statistical prior regularization, respectively. This framework compensates for the shortcomings of each operator, achieving high-fidelity phase retrieval with low computational complexity and strong generalization. In Fig. 15(c), Wang et al.95 fine-tuned neural network parameters with physics-consistency loss, eliminating artifacts caused by generalization issues in pretrained networks when reconstructing out-of-domain data. They successfully achieved high-quality SPI with a sampling ratio as low as 6.25% and validated the approach in an outdoor experiment of SPI LiDAR. In Fig. 15(d), Zhang et al.209 fine-tuned a generative model with physics-consistency loss, enabling the generator to produce images that satisfy physical constraints. Unlike Wang et al.’s approach, generative priors offer better flexibility as a single generative model can adapt to multiple imaging tasks. However, because the initial estimation is not constrained by the measurements, more iterative steps are required to achieve feasible results. These methods are essentially optimization algorithms. Compared with traditional optimization methods, the integration of neural networks introduces deep image priors or statistical prior regularization, leading to better convergence. However, because of the involvement of neural networks, the iterative optimization process often becomes more time consuming.

$Examples of deep-learning-based methods incorporating physics priors. (a) Resolution enhancement using a neural network model trained on simulation data, reproduced with permission from Ref. 97 © 2018 Springer Nature. (b) High-quality, full-color, wide FOV imaging using end-to-end designed nano-optics, reproduced from Ref. 142 (CC-BY). (c) Holographic reconstruction using a diffractive neural network, reproduced with permission from Ref. 187 © 2021 American Chemical Society. (d) Deep learning for speckle imaging with interpretable speckle-correlation for preprocessing, reproduced with permission from Ref. 109 © 2021 Chinese Laser Press. (e) Unrolling a model-based optimization algorithm for lensless imaging, reproduced with permission from Ref. 113 © 2020 Optical Society of America. (f) Holographic reconstruction with a neural network model trained by self-supervised learning, reproduced from Ref. 220 (CC-BY). (g) Spectrum analyzer using physical-model and data-driven model combined neural network, reproduced with permission from Ref. 227 © 2023 Wiley-VCH GmbH.$

Figure 15.Examples of deep-learning-based methods incorporating physics priors. (a) Resolution enhancement using a neural network model trained on simulation data, reproduced with permission from Ref. 97 © 2018 Springer Nature. (b) High-quality, full-color, wide FOV imaging using end-to-end designed nano-optics, reproduced from Ref. 142 (CC-BY). (c) Holographic reconstruction using a diffractive neural network, reproduced with permission from Ref. 187 © 2021 American Chemical Society. (d) Deep learning for speckle imaging with interpretable speckle-correlation for preprocessing, reproduced with permission from Ref. 109 © 2021 Chinese Laser Press. (e) Unrolling a model-based optimization algorithm for lensless imaging, reproduced with permission from Ref. 113 © 2020 Optical Society of America. (f) Holographic reconstruction with a neural network model trained by self-supervised learning, reproduced from Ref. 220 (CC-BY). (g) Spectrum analyzer using physical-model and data-driven model combined neural network, reproduced with permission from Ref. 227 © 2023 Wiley-VCH GmbH.

Download full size

View all figures

In addition, we have identified two key factors that are helpful in distinguishing these physics-enhanced DL methods: the reliance on physical knowledge and the need for training data. As shown in Fig. 16, the incorporation of a physical model can decrease reliance on training data, and vice versa. This is because both sources provide valuable information about the object of interest. Training data statistically estimate the object distribution, whereas physical models impose structural constraints based on raw measurements.

Figure 16.Comparison of required training data and required physical knowledge for computational imaging techniques based on different physics-enhanced deep learning approaches. Blending physical knowledge and deep learning usually results in a compromise between the required physics and training data. This suggests that physical knowledge can be used to reduce the required training data, whereas the data from the imaging system can also be used to reduce the required physics.

Download full size

View all figures

Conventional data-driven DL approaches, which depend exclusively on training data and eschew physical knowledge, necessitate extensive datasets but minimal physical understanding. In contrast, the untrained approach minimizes data reliance by eliminating the need for training data. Other methods leverage a combination of training data and physical knowledge. For instance, learning from simulation, end-to-end optical design, optical neural networks, interpretable layers, unrolling/unfolding, self-supervised learning, and consistency regularization all culminate in training an NN model on a specific dataset. Despite integrating physical models, these methods are fundamentally learning-based, offering rapid imaging speeds but grappling with generalization and interpretability challenges.

On the other hand, plug-and-play, generative prior, and physics-driven fine-tuning methods also employ data to train an NN model. However, these methods obtain final results through iterative optimization under the guidance of physical models, classifying them as optimization algorithms. These approaches demonstrate superior scalability and interpretability as the imaging outcomes adhere to physical model constraints. However, they generally require multiple iterations to achieve final results.

The physics-driven fine-tuning method occupies a unique position. When processing in-domain test data with a pretrained model, it operates as a learning-based algorithm. However, if out-of-domain data yield unsatisfactory imaging results, fine-tuning the network parameters with physical models is essential, effectively transforming it into an optimization algorithm. This dual nature allows physics-driven fine-tuning methods to amalgamate the swift imaging speeds of learning-based methods with the scalability and interpretability of iterative optimization algorithms.

In addition to comparisons in terms of reconstruction speed, generalization, and interpretability as discussed above, evaluation of image reconstruction quality also varies significantly across different physics-enhanced deep learning methods. When ground truth is available (e.g., in benchmark datasets), models can be directly evaluated using standard metrics such as PSNR and SSIM. For methods without access to ground truth, evaluation can be conducted using synthetic data. In such cases, although the ground truth is known, it is not used during training, allowing objective assessment of reconstruction fidelity in a controlled setting.

In practical experimental scenarios where ground truth is not available, one common strategy is to use a high-precision reference method to reconstruct the same object.85^,95 The result from this method is then treated as a proxy ground truth for comparative evaluation. Alternatively, a back-projection of the reconstructed result can be performed, and the consistency between the reprojected data and the unused measurement data can be quantified as an indirect measure of reconstruction fidelity.228

Furthermore, Bayesian neural networks offer the capability to estimate uncertainty in the reconstruction process.332 This provides an additional means of assessing the reliability and fidelity of the reconstructed images, even in the absence of ground truth.89^,128^,333

As an illustrative example, consider the application of the physics-driven fine-tuning method in single-pixel imaging. Figure 17 illustrates that although analytical and iterative algorithms produce comparable results across various targets, they require a larger number of measurements for high-quality outcomes [as shown in the blue and gray lines of Figs. 17(a2), 17(a3), 17(b2), and 17(b3)]. Although data-driven DL methods perform well in in-domain test data, their efficacy plummets with out-of-domain data [as indicated by the green line in Figs. 17(a4) and 17(b4)]. In stark contrast, physics-enhanced DL can deliver high-quality reconstructions for diverse object types with a minimal number of measurements [represented by the purple line in Figs. 17(a5) and 17(b5)]. This underscores the potential of physics-enhanced DL to tackle a broad spectrum of inverse problems in CI, achieving optimal overall performance.

Figure 17.Comparison of typical inverse reconstruction algorithms for image reconstruction results under different measurement counts: single-pixel imaging example. (a) Results for the English letter “Q”: (a1) ground truth image; (a2) linear correlation algorithm; (a3) compressed sensing algorithm; (a4) data-driven deep learning; (a5) physics-enhanced deep learning. (b) Results for our logo: (b1) ground truth image; (b2) linear correlation algorithm; (b3) compressed sensing algorithm; (b4) data-driven deep learning; (b5) physics-enhanced deep learning. The data-driven deep learning algorithm uses a U-Net-like model trained on the EMNIST dataset, and the physics-enhanced deep learning method refines the trained U-Net using a physics-driven fine-tuning approach reported in Ref. 95. The results in (a2)–(a5) and (b2)–(b5) were all obtained with a measurement number of 819, and the image resolution was 64×64 pixels.

Download full size

View all figures

6 Conclusion and Perspective Remarks

This review has offered an extensive overview of the development of DL-based CI. We have introduced the fundamental concepts underlying CI and DL, elucidated how DL can be harnessed to address inverse problems in CI, and summarized the advantages and challenges associated with traditional data-driven DL. We have placed particular emphasis on the recently developed physics-enhanced DL methods, detailing their implementations through the integration of physics priors with data, networks, and loss functions. It is important to recognize that this is a rapidly evolving field, with new findings and breakthroughs emerging at a rapid pace. Consequently, encompassing all contributions and keeping abreast of all advancements in a single review is a formidable task. Despite this, we have identified several key trends that warrant further investigation.

6.1 Incorporating Advanced AI Technologies

The domain of AI is progressing at a swift pace, with innovative learning architectures and training methodologies being introduced regularly. Although these techniques may not have been specifically tailored for CI tasks, they present superior methods for feature representation learning, which could potentially enhance the extraction of prior information from data. For instance, few-shot learning algorithms may mitigate the need for extensive training datasets,334 lightweight neural networks can inherently boost computational efficiency,313^,335 and Bayesian deep neural networks offer uncertainty quantification for network predictions.89^,128^,332^,333 Drawing inspiration from neural network acceleration techniques can expedite the practical application of DL-based CI methods.336 The increasing prevalence of attention mechanisms in enhancing performance is evident in recent works.337^–340 Moreover, the exploration of AutoML for automated neural network design is a promising avenue.341 In addition, foundation models may find utility in CI tasks.209^,317^,342^,343

6.2 Addressing Inaccuracies in Forward Modeling

A fundamental challenge in the widespread adoption of physics-enhanced DL methods is the necessity for a high degree of similarity between the established forward physical model and the actual image formation process. When uncontrollable factors disrupt the imaging chain, such as scattering media, optical aberrations, or detector noise, the theoretical physical model can deviate significantly from the actual model. This deviation may result in the generation of data that does not align with the degraded measurements, potentially skewing network predictions. The main approaches to mitigate this problem include: (1) experimentally calibrating an empirical forward model and (2) implementing adaptive learning algorithms to remove the uncertainty within the image formation model.199^,200^,206^,229 When the forward physical process changes over time, such as imaging through dynamic scattering media, the mixture of experts (MoE) framework can be used to train the model to adapt to different imaging conditions.146

6.3 Considering Large-Scale CI Problems

The escalating demand for higher imaging resolution and increased dimensionality in CI inevitably necessitates large-scale data processing.184^,344 This requirement places significant demands on storage and computational resources for managing measurement matrices and executing image reconstruction. A common solution to this challenge involves establishing more powerful computational platforms in hardware, along with developing more efficient data storage and processing methodologies. Training a lightweight neural network model for noniterative object image reconstruction can facilitate large-scale real-time imaging tasks; however, this data-driven approach also presents challenges related to data acquisition, generalization, and interpretability.24^,79^,292^,315 Emerging methods that utilize region-based reconstruction strategies, such as neural field approaches201^,214^,345 and the scalable iterative mini-batch algorithm,185 can significantly reduce the memory requirements for large-scale CI reconstruction. Photonic computing may also be a potential solution for large-scale CI problems, leveraging their capacity for high parallelism and low power consumption in computational reconstruction.74^,190^,346

6.4 Observing More Significant Objects

Most DL-based CI methods primarily focus on imaging standard test targets, such as SLM-loaded objects, resolution charts, or crafted samples, to showcase the merits of their methodologies. This diverges significantly from the real applications these imaging technologies aim to promote. We believe there are mainly two reasons: first, the widespread use of DL relies on large amounts of training data, which is often difficult to obtain for real-world CI applications; second, due to their technical backgrounds, algorithm researchers typically lack the capabilities for system construction, sample preparation, and result analysis required for practical applications, whereas those involved in system development may face challenges in developing advanced DL-based algorithms. Recently, as discussed in this review, significant progress has been made in DL-based CI algorithms, particularly after the incorporation of physics priors with DL.347 Some studies have begun applying learning-based approaches to observe high-value targets. For example, Medeiros et al.348 presented a novel dictionary-learning-based algorithm to recover high-fidelity images of the M87 black hole in the presence of sparse coverage, achieving the nominal resolution of the Event Horizon Telescope array. In the future, we believe that more DL-based CI methods will be utilized for observing high-value targets in fields such as biology and astronomy, becoming an important development direction within AI for Science (AI4S).349 It is important to note that there are high requirements for image fidelity when imaging high-value targets, as artifacts in the images can lead to misleading interpretations, obscuring critical details, and potentially resulting in incorrect conclusions about the underlying processes being studied. To assess the reliability of network predictions, methods such as Bayesian neural networks can be considered.332 These approaches quantify uncertainty, providing confidence intervals for each prediction.89^,128^,333

6.5 Benchmarking Platforms

As shown in Fig. 18, numerous DL algorithms have been proposed to solve various CI problems. However, many existing studies rely on proprietary or small-scale datasets, making it challenging to benchmark and evaluate methodologies effectively. Furthermore, inconsistencies in data preprocessing, network architectures, and evaluation metrics create additional barriers to reproducibility. The success of benchmarks in the field of computer vision (CV) offers a valuable blueprint for the CI community. In CV, datasets such as ImageNet130 have played a pivotal role in driving rapid advancements. These benchmarks not only standardize tasks but also enable fair comparisons between methods. For example, the ILSVRC helped revolutionize DL in CV, showcasing the performance gains achieved by DL and inspiring further innovations.75^,350 We envision a similar transformation in CI through the creation of large-scale, open-access CI databases encompassing diverse imaging scenarios. Such databases would allow researchers to evaluate their algorithms under consistent conditions and promote innovation by highlighting the strengths and limitations of different approaches. However, adopting benchmarking practices in CI faces two significant challenges. (1) Real-world CI datasets are often expensive and time-consuming to obtain, involving complex experimental setups and skilled personnel. (2) Researchers may be hesitant to share proprietary algorithms and data due to intellectual property concerns or competitive pressures in academia and industry. To address these challenges, we propose that the CI community adopt common testing protocols and establish centralized benchmarking platforms. These platforms could serve as hubs for researchers to upload datasets, models, and results for standardized testing. Contributors would receive appropriate credit, fostering a culture of collaboration and transparency. Furthermore, usage restrictions could be implemented to protect intellectual property and ensure ethical practices.

6.6 Image-Free Sensing

The ultimate objective of imaging is to glean information about the target, with images acting as intermediaries that carry this information. Although current information acquisition largely depends on images, the evolution toward CI suggests that obtaining images may not always be a prerequisite. Image-free sensing is an emerging direction where the optical encoding component of CI is co-optimized with decoding algorithms, akin to the end-to-end optical design approach. The fundamental distinction is that in image-free sensing, the decoding network bypasses image generation, directly extracting and providing information contained within the images, such as object categories and positions.74 This can also be construed as an optoelectronic hybrid neural network, with the optical component responsible for feature extraction through encoding mode design, and the electronic network performing pattern recognition based on these extracted features.351 Image-free sensing offers significant advantages in efficiency over image-based machine vision methods by selectively acquiring only the most relevant information needed for pattern recognition, thereby eliminating the processing of images laden with redundant data.302^,352^–354 It also inherently preserves privacy by capturing non-image-like features instead of producing human-interpretable images. We believe this approach represents a promising pathway for achieving low-latency, high-speed, low-power, and privacy-preserving neuromorphic vision.355

6.7 Advanced Encoding Schemes

As mentioned in Sec. 1, the performance of CI is mainly determined by both the encoding and decoding processes. Although this paper focuses primarily on the decoding part, we also wish to highlight that the encoding aspect deserves great attention, even for those engaged in algorithm development. This is especially important because traditional encoding schemes used in CI are gradually being replaced by more advanced light-field control techniques.356^–362 For example, metasurfaces have increasingly found applications in CI in recent years.363 These thin and lightweight metasurfaces are being used to replace traditional lenses in imaging systems364 and encode high-dimensional light-field information,363 replacing conventional dispersive elements such as prisms and gratings. This transition is not only enabling more compact imaging systems but also paving the way for new avenues in system miniaturization, particularly by integrating metasurfaces directly with detectors. We believe that despite existing gaps between the latest modulation devices and the corresponding algorithms, mainly due to the complexity of background knowledge, there will inevitably be closer integration between the two in the future. After all, CI has always been a result of the combined design of optics and algorithms, and we have not yet reached a stage where the encoding schemes are fixed and only the optimization of the algorithm remains the focus.

We have witnessed a paradigm shift in CI reconstruction algorithms, evolving from traditional physics-based methods to data-driven DL methods, and now progressing toward physics-enhanced or physics-informed DL methods. By developing diverse physics-enhanced DL-based CI methodologies, we can strike a balance between the demands of data acquisition and the requirements of physical modeling, ultimately optimizing the performance of imaging systems. The integration of physics priors with DL offers new solutions to the common challenges faced by traditional data-driven DL methods, such as difficulties in acquiring training data, poor generalization, lack of interpretability, and time-consuming network training. We are confident that the further fusion of CI domain knowledge with cutting-edge DL technologies—alongside advances in information theory, improvements in computer hardware, and innovations in light field control and signal processing—will enable DL-based CI, particularly methods that incorporate physics priors, to find broad applications across diverse fields, including astronomy, biology, scientific discovery, industry, transportation, and beyond.

7 Appendix

The table pictured in Fig. 18 presents representative applications of physics-enhanced DL methods in CI. Owing to space constraints and the limits of our expertise, certain important works are inevitably omitted; their absence does not imply lesser significance.

Figure 18.Typical computational imaging applications based on different physics-enhanced deep learning schemes.⁸²^,⁹⁰^,⁹¹^,⁹³^–⁹⁵" target="_self" style="display: inline;">^–⁹⁵^,⁹⁷^,¹⁰⁹^,¹¹³^,¹¹⁵^,¹²⁴^,¹⁴²^,¹⁴³^,¹⁴⁶^,¹⁴⁹^,¹⁵¹^,¹⁷⁵^–²⁸⁹" target="_self" style="display: inline;">^–²⁸⁹

Download full size

View all figures

Acknowledgments

Acknowledgment. This work was supported by the National Natural Science Foundation of China (Grant Nos. 62325508, 62405334, and U22A2080), the National Key R&D Program of China (Grant Nos. 2024YFF0505604 and 2024YFF0505600), the Program of Shanghai Academic Research Leader (Grant No. 22XD1403900), the Shanghai Sailing Program (Grant No. 23YF1454200), the Shanghai Municipal Science and Technology Major Project, and the Reinhart Koselleck project (Grant No. 560574412). Phys-Deep-Fiber/Physics-Informed Deep Learning Systems for Secure Information Transmission with Multimode Fibers.

Fei Wang is an associate researcher at Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences. His primary research lies in deep-learning-based computational imaging. He is particularly focused on leveraging artificial intelligence to address challenges in optical information acquisition arising from limitations in photo-detection, light collection, and optical propagation. He was awarded Special Prize of the President Scholarship of Chinese Academy of Sciences and Innovative Paper Award of the Chinese Optical Engineering Society.

Juergen W. Czarske (Fellow EOS, OPTICA, SPIE, IET, IoP) is full chair professor and director at TU Dresden, Germany. He is an international prize-winning inventor of laser-based technologies, computational imaging, biophotonics and intelligent photonics. His awards include 2008 Berthold Leibinger Innovation Prize, 2019 OPTICA Joseph-Fraunhofer-Award, 2020 IEEE Laser Instrumentation Award, 2022 SPIE Chandra S. Vikram Award, 2024 SPIE Dennis Gabor Award, and 2025 Koselleck project prize. He is vice president of International Commission for Optics, ICO.

Guohai Situ (Fellow OPTICA) is the general manager of Shanghai Institute of Laser Technology Co., Ltd., and adjunct professor of Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences. His personal research interests encompass a broad spectrum of deep-learning-based computational optical imaging. Currently, his focus is on imaging through optically thick scattering media, computational ghost imaging, and phase imaging. He was awarded The National Science Fund for Distinguished Young Scholars, NSFC, China.

Category: Reviews

Received: Feb. 7, 2025

Accepted: Jul. 21, 2025

Posted: Jul. 21, 2025

Published Online: Sep. 4, 2025

The Author Email: Guohai Situ (ghsitu@siom.ac.cn)

DOI:10.1117/1.AP.7.5.054002

CSTR:32187.14.1.AP.7.5.054002