Image reconstruction through a nonlinear scattering medium via deep learning

Shuo Yan; Yiwei Sun; Fengchao Ni; Zhanwei Liu; Haigang Liu; Xianfeng Chen

doi:10.1364/PRJ.523728

1. INTRODUCTION

The study of the light behavior in complex media has contributed to the development of fields such as biological imaging [1], non-line-of-sight imaging [2], and pulse shaping [3]. In a complex medium, ballistic photons are exponentially attenuated with propagation depth. It causes spatial distortion of the input light and the output light [4], which gives rise to a random interference speckle pattern [5]. Various optical imaging approaches have been developed to overcome the detrimental effects of such complex media, such as confocal detection [6] and optical coherence tomography (OCT) [7]; however, the imaging depth of these methods is limited by the intensity of single-scattered waves, which decreases exponentially with depth. Diffuse optical tomography [8] exploits multiple-scattered waves to form an image and has much deeper imaging depth than the OCT but with lower spatial resolution. Another common approach is scattering autocorrelation imaging technology [9,10], which enables noninvasive imaging but has a small field of view due to the optical memory effect. The proposal of the wavefront shaping (WS) technique opens the way to control light propagation in turbid media, such as in amorphous or disordered materials [11], biological tissues [12,13], complex photonic structures [14,15], and multimode fibers [16 –18]. A transmission matrix (TM) [19,20] can be used to characterize the linear input–output relationship of a fixed scattering medium based on the superposition principle. By measuring the TM of the scattering medium, the image reconstruction can be realized [21 –23].

Scattering medium is typically used as a linear operator in most previous research. In fact, there have been many nonlinear effects that provide superior performance in the field of biological imaging, such as two-photon excitation fluorescence (TPEF) microscopy [24], second-harmonic generation (SHG) microscopy [25], and coherent anti-Stokes Raman scattering (CARS) microscopy [26]. Compared with conventional imaging techniques, nonlinear optical imaging has the advantages of deeper imaging depth (TPEF and SHG microscopy) and higher spatial resolution (CARS microscopy). In fact, nonlinear imaging is not limited to the optical field but also includes ultrasound imaging. In the early stage, ultrasound imaging technology is based on linear acoustic principles, while the nonlinear propagation of sound waves in biological tissues can generate nonlinear acoustic signals and achieve tissue harmonic imaging [27], which has better spatial resolution and contrast than linear ultrasound imaging. Nonlinear imaging has been demonstrated to enable label-free imaging of tissues. Similar to the linear scattering imaging process, nonlinear signals are also highly susceptible to scattering. Therefore, the corresponding nonlinear scattering optical imaging is also a fundamental problem in the field of imaging. In addition, random multiple scattering provides a new scheme for message encryption, while nonlinear scattering with higher complexity is expected to provide a more secure scheme for optical cryptography [28,29]. However, in nonlinear scattering media, which can radiate nonlinear signals, the coupling of multiple scattering and nonlinear processes makes it difficult to characterize the nonlinear scattering processes. Most of the nonlinear imaging is concentrated on either homogeneous nonlinear media [25] or linear scattering media combined with homogeneous nonlinear media [30]. Although the scattering matrix [31] and scattering tensor [32] of the nonlinear scattering medium have been proposed thus far, the exponentially increasing complexity makes them unsuitable for image reconstruction. To our best knowledge, there is no report on the image reconstruction through nonlinear scattering medium.

The deep learning (DL) method has been developed to possess the ability of extracting intrinsic features and dividing the decision boundary according to data [33]. DL is shown to be a powerful tool in the process of scattering imaging [34], super-resolution imaging [33,35], holography [36], phase recovery [37,38], and optical orbital angular momentum communication [39], which does not need to consider complex process but end-to-end results process. It is pivotal to underscore the theoretical underpinnings that facilitate these methodologies’ efficacy in handling nonlinear image reconstruction challenges. Multilayer feedforward networks possess the universal approximation capability, essentially enabling these networks to model any nonlinear continuous function and its derivatives [40,41]. This principle underlies the natural proficiency of DL methods in extracting intrinsic features and delineating decision boundaries directly from data without necessitating the articulation of complex intermediary processes. For example, using DL methods, it is possible to achieve physics-informed imaging through unknown thin scattering media, which can lead to high reconstruction fidelity for sparse objects by training with only one diffuser [42]. Additionally, the DL method based on a projector network was developed to project colorful images through scattering media using three primary colors [43]. DL networks, such as an improved U-Net, were utilized to harness an object’s polarization information from scattering images [44]. Additionally, a DL network, combining the gating network (GTN) and the deep neural network (DNN), can achieve a reasonable selection of polarization characteristics and utilize a single model to adapt to various extensive scattering conditions [45]. However, the aforementioned research was conducted in linear systems. In the nonlinear domain, there is no existing research utilizing DL methods. Since the DL method has nonlinear mappings, it naturally has advantages in addressing the nonlinear image reconstruction problem [46].

In this paper, we develop an image reconstruction technique that can restore the phase information of the fundamental frequency (FF) wave through the nonlinear scattering signal of the nonlinear scattering medium via DL method. We use part of the images and the corresponding nonlinear speckle patterns as the training set to train the nonlinear speckle decoder network (NSDN) and the others as the test set. The trained NSDN can be used to reconstruct the wavefront information of the FF wave through a nonlinear speckle pattern. Through different data sets and different diffuser experiments for training and analysis, it is found that the system we proposed here has great ability in nonlinear image reconstruction and robustness in different conditions.

2. RESULT

The phase distribution of the image uploaded on the FF beam is destroyed when the FF signals interact with the nonlinear scattering medium, which is ${LiNbO}_{3}$ powder in the experiment (see Appendix A). Further, the second harmonic (SH) signals also possess distorted phase distribution and form the nonlinear speckle pattern; the scattered SH signal at $m$ th output channel can be expressed as [32] $E_{m}^{out} (2 ω) = \sum_{n, o}^{N} k_{m n o}^{NL} E_{n}^{in} (ω) E_{o}^{in} (ω),$ (1)where $E_{n}^{in} (ω)$ and $E_{o}^{in} (ω)$ are the $n$ th and $o$ th output channels of the FF input field, and $k_{m n o}^{NL}$ is the element of scattering tensor $K^{NL}$ , which contains the information of generation and scattering process of SH signals. Its complexity is much greater than that of the linear scattering process, which can be expressed as $E_{m}^{out} (ω) = \sum_{n}^{N} k_{m n} E_{n}^{in} (ω)$ ; further, imaging through linear scattering media can be realized directly by inverting the transmission matrix. Due to the complexity of the nonlinear scattering process, it is difficult for the traditionally physical method to reconstruct the phase distribution of the FF wave through the nonlinear harmonic speckle pattern. Here, we specifically propose a convolutional neural network named NSDN to realize image reconstruction from nonlinear speckles; the process is shown in Fig. 1 (see Appendix B). Our experiment is based on the principle of a one-to-one relation between speckle and recovered image, achieved through speckle segmentation mapping with ground truth segmentation.

Figure 1.Process of reconstructing the original image by SH speckle. Different phase distribution of the image uploaded on FF beam will interact with the nonlinear scattering medium and generate a different SH speckle pattern. The original image and SH speckle patterns are fed into NSDN for joint training. The acquired SH speckle is fed into the learned NSDN to reconstruct the original image.

Download full size

View all figures

The architecture of NSDN is shown in Fig. 2. It is a U-net architecture [47], which has proven its ability in image segmentation. The image reconstruction principle of U-Net is primarily based on its symmetric encoding–decoding structure, skip connections, and fusion of local and global information [48]. This approach helps retain more detailed information during the upsampling process and improves gradient flow, making the network training more effective. We have improved feature representation ability and reduced model overfitting of this network by inserting a DenseBlock after each convolutional layer. The DenseBlock design, which connects each layer to all preceding layers, reduces parameter numbers, enhances gradient flow, and maintains image details and contextual information, making it efficient and effective for image reconstruction tasks. Using DenseBlock also brings additional benefits such as improved feature reuse and faster convergence speed [49]. The training process is conducted using Python 3.6 on an NVIDIA Quadro P4000 GPU, and our code is implemented based on TensorFlow. The cost function is expected to be minimized through the training process using an adaptive moment estimation (ADAM) optimizer, with a binary cross-entropy function employed as the loss function used here.

Figure 2.NSDN architecture. Each box corresponds to a multichannel feature map. The number of channels is indicated at the top of each box, and the $x - y$ size is provided at the left edge. The color of the boxes corresponds to different operation types, as listed in the lower-right corner of the figure. Arrows indicate the direction of data operations.

Download full size

View all figures

First, we evaluate the physical relevance between the FF image and SH speckle using a handwriting digit from the MNIST database, which is used as a concise data set to illustrate the capability of relatively low spatial frequency image reconstruction. The images of the MNIST data set have only two different grayscales (0 or 127 corresponding to phase change of 0 or $π$ ) and its structure is relatively simple. The MNIST data set used in the experiment includes 60,000 patterns. The data set comprises original FF images (ground truth) and their corresponding SH speckles for various object categories; further, 59,000 patterns among them are randomly selected as the training set, and the other 1000 patterns are applied as test data set. Before the training process, we implement a preprocessing procedure on the SH speckle patterns obtained by CCD. The original speckle patterns are about $150 \times 150$ pixels and are subsequently cropped into $120 \times 120$ pixels in the center of the image and resized into $64 \times 64$ pixels to fit the computational limitation of NSDN. Images in the training and test data sets are normalized between 0 and 1 for stabilizing the process of training. Then, we build a backpropagation NSDN to learn a statistical relevance between the SH speckle patterns and the FF signals with a goal of recovering the FF images from nonlinear speckles.

Second, we test our NSDN to predict the phase distribution using the handwriting figure shape of the FF beam from the corresponding SH speckle. Representative examples of the speckle and prediction pairs of MNIST are shown in Fig. 3(a). The first row shows the phase pattern loaded on the SLM. We would like to point out that these images for ground truth are not seen in the training process. The second row displays the corresponding scattered SH speckles collected by CCD. The third row presents the image information of the FF light predicted by the scattered SH signals via our trained NSDN. Therefore, the successful demonstration of nonlinear transmission relevance in reconstructing objects through scattering ${LiNbO}_{3}$ powder is achieved. To quantify the difference between a reconstructed image and its original counterpart, we introduce two parameters, peak signal-to-noise ratio ( $α$ ) and structural similarity index ( $β$ ), for quantitative description. $α$ is a full-reference image quality evaluation index, $α = 10 \log [\frac{{(2^{n} - 1)}^{2}}{MSE}],$ (2)where $n$ is the number of bits per pixel, and the unit of $α$ is dB. $MSE = \frac{1}{a b} {\sum_{i}^{a} \sum_{j}^{b} [x (i, j) - y (i, j)]}^{2}$ is the mean squared error, where the pixel size of the image is $a \times b$ , and $x$ and $y$ represent the original image and predicted image. The larger value of $α$ means the lower distortion of the figure.

Figure 3.Reconstruction of MNIST data set. (a) Prediction results of test MNIST data set and (b) the corresponding $α$ and $β$ evolution curves in the training process. (c) Prediction evolution results of MNIST data set, with corresponding values of $α$ and $β$ .

Download full size

View all figures

$β$ can offset the defect that $α$ cannot measure in terms of the similarity of the image structure, $β (x, y) = \frac{2 μ_{x} μ_{y} + C_{1}}{μ_{x}^{2} + μ_{y}^{2} + C_{1}} \cdot \frac{2 δ_{xy} + C_{2}}{δ_{x}^{2} + δ_{y}^{2} + C_{2}},$ (3)where $μ_{x}$ and $μ_{y}$ are all mean pixels of images, $δ_{x}$ and $δ_{y}$ are the standard deviation of the image pixel values, $δ_{x y}$ is the covariance of $x$ and $y$ , and $C_{1}$ and $C_{2}$ are constants in order to avoid system errors caused by denominator 0. $β$ is a number between 0 and 1. The larger value of the $β$ also represents the smaller difference between two images.

Figure 3(b) displays the $α$ and $β$ evolution curves of the training process, which begins to converge at the twentieth epoch. The test set converges around epoch 13. When we changed the learning rate at the sixtieth epoch, the reconstructed image quality (PSNR) can be significantly improved for training set. $α$ exceeding 20 dB usually can be considered acceptable imaging quality [50]. To show the training process intuitively, Fig. 3(c) displays the prediction evolution as the training epoch increases, exhibiting progressive improvements in image recovery.

To further verify the practicality of our system for image reconstruction, we use a data set that contains high spatial frequency to verify the situation of a more complex image. The CIFAR data set is used in our experiment, which contains 10 different types of 60,000 images, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The experiment is repeated with replacing the MNIST by CIFAR; the experimental reconstruction results of CIFAR are as shown in Fig. 4(a). Compared with MINST, the images in CIFAR have more details, including 0–127 gray-scale values and irregularity in object shapes, which are more difficult to be reconstructed. The $α$ and $β$ evolution curves shown in Fig. 4(b) converge more slowly than the case shown in Fig. 3(b), which indicates the different difficulty in simple and complex image recovery. We also note that the reconstructed images are still blurry and unrecognizable before epoch 40, as illustrated in Fig. 4(c), but they are significantly improved after the fiftieth epoch. However, the reconstruction results are inferior to the case of MNIST both from the visual view of the restored image and the evaluation parameters $α$ and $β$ .

Figure 4.Reconstruction of CIFAR data set. (a) Prediction results of test CIFAR data set and (b) the corresponding $α$ and $β$ evolution curves in the training process. (c) Prediction evolution results of CIFAR data set, with corresponding values of $α$ and $β$ .

Download full size

View all figures

In order to verify the robustness of our method to various scattering medium, we test the generalization of NSDN in a speckle decoder. In the experiment, we constructed a new data set containing a total of 60,000 images with MNIST and CIFAR as input images and randomly divided this data set into three groups, each containing 20,000 images. These three sets of images were loaded onto the SLM to modulate the wavefront of the FF wave, and the corresponding nonlinear scattering signals were collected. Similar to the previous experiment, 59,000 images are used as the training set, and 1000 images are used as the test set. The average particle size of the three diffusers used in the experiment ranged from 2 to 0.5 μm, as shown in Fig. 5(a). Since these three lithium niobate powders have different grain sizes, they exhibit distinct scattering characteristics [51], resulting in different intensity distributions of the nonlinear scattering field. The speckles and their reconstructed images are shown in Fig. 5(b). As shown in the recovery effect in the predicted images of Fig. 5(b), there is almost no difference comparing with situation using single diffuser in Figs. 3 and 4. This demonstrates that the proposed NSDN adapts to different nonlinear diffusers of the same class and indicates the superior robustness of the proposed NSDN.

Figure 5.Verification of robustness of NSDN for different diffusers. (a) Scanning electron microscope image of ${LiNbO}_{3}$ diffusers. (b) Reconstruction results for each data set using different diffusers, with each column corresponding to a specific diffuser.

Download full size

View all figures

Further, we demonstrate its ability for image reconstruction of a kind of completely unseen object category through trained NSDN. In the experiments, we use the same diffuser of the first experiment. The MNIST and CIFAR data sets are divided into 10 classes based on labels, and each class contains 60,000 images. For MNIST, we put all 6000 images labeled “0” into the test set and the remaining 54,000 images labeled “1” to “9” into the training set. For CIFAR, we separate the “aircraft” class as the test set containing 6000 images and feed the other 54,000 images into the training set. Representative prediction examples of unseen classes are illustrated in Fig. 6. The results demonstrate that the NSDN can realize relatively high-accuracy unseen images prediction on both simple binary and complex gray-scale images.

Figure 6.Reconstruction results of unseen class of MNIST and CIFAR data sets.

Download full size

View all figures

3. DISCUSSION

We conduct four different experiments to demonstrate the system we proposed here, which has a great ability in nonlinear image reconstruction and robustness in different conditions. First, we compare the prediction between experiments 1 and 2; the reconstruction results in CIFAR are worse than in MNIST, as shown in Fig. 7(a). This is due to the fact that the high-frequency information of an image is difficult to collect in the far field after scattering. The disparity in restoration quality between CIFAR and MNIST data sets can be attributed to two factors. First, the limited resolution of the imaging system makes it challenging to capture high spatial-frequency information accurately. Second, the high spatial-frequency component is difficult to incorporate into the second harmonic process, and it is more vulnerable to being affected by background noise. A more detailed comparison about the evolution indicators is given in Fig. 7(a), where a comparison of the effects of different diffusers is also given. The different diffusers make almost no effect on the performance of reconstruction, which suggests that the NSDN has a physical invariance with training and testing the same object data set and the correlation between diffusers and speckles is decoded by NSDN to almost the same degree. This demonstrates the robustness to intensity distribution of speckle in NSDN. Next, as shown in Fig. 7(b), we can see that the recovery effect, which is trained with seeing all classes, of unseen classes is close to original results. The prediction accuracy invariance maintained across unseen SH speckles from a same data type does suggest that there are learnable and generalizable features in our NSDN model. This result reveals that the trained model has a universality in nonlinear image recovery, and provides a possibility towards decoding of high-dimensional complex phase information. The codes and set of examples are shown in Code 1 and Dataset 1 [52,53].

Figure 7.Quantitative evaluation of the NSDN performance. 1st and 2nd represent different diffusers, respectively; M and C mean MNIST and CIFAR data set, respectively; US means using unseen classes as test data set. (a) Different diffusers. (b) Unseen classes.

Download full size

View all figures

In the field of scattering image reconstruction, the TM method is commonly employed. This approach enables rapid and precise recovery of various images after light passes through a strongly scattering medium [54]. Moreover, compared with optical coherence tomography (OCT), the optical imaging matrix method extends the imaging depth limit for biological tissues by a factor of 2 [55]. Additionally, autocorrelation methods can be used to detect the optical field through the scattering medium. This technique leverages the optical memory effect without relying on prior information, allowing for noninvasive imaging of fluorescent objects completely hidden behind an opaque scattering layer [9]. While it addresses the invasive nature of wavefront shaping and the transmission matrix, the autocorrelation method is time-consuming for the entire scanning process, thus limiting its capability to achieve real-time imaging through scattering media. Compared with traditional methods, DL method offers automated feature extraction capabilities, making image reconstruction more efficient and accurate. Further, the DL method exhibits a certain degree of robustness, thereby enhancing its potential for practical applications.

4. FUTURE WORK AND APPLICATIONS

We have demonstrated a deep learning framework for image reconstruction through a nonlinear signal of the scattering medium. Due to the presence of cross-terms of the input field, which makes the rank of the nonlinear scattering tensor $(N + 1) / 2$ (where $N$ is the number of input modes) times larger than that of linear counterpart [32], a huge computational power is consumed to recover information of the input field. Although it is difficult to give a specific input–output relationship, our proposed method successfully solves nonlinear problems. In addition, it can be extended to other nonlinear frequency conversion processes such as third-harmonic generation, four-wave mixing, and stimulated Raman scattering with proper configurations, which greatly extend the application scenarios of scattering imaging. In addition, secure communication is important in modern information society. In conventional communication schemes, two communicating parties share separate keys generated using optical methods to achieve encryption of information transmitted over a public channel. Only the receiver holding the key can decrypt the information. The TM method provides a new encryption scheme that utilizes the random nature of the scattering medium to encrypt the information [28,29], and decryption of the information can only be achieved by the receiver who possesses the TM of the scattering medium. However, these applications are all based on linear scattering media, making optical encryption vulnerable to a powerful attack known as a chosen ciphertext attack (CCA) [56], which is analogous to measuring the TM of the scattering medium. If an attacker gains access to the information in the TM, any ciphertext can be directly decrypted by the inverse of the TM. Our proposed image transmission scheme based on a nonlinear scattering medium can realize nonlinear optical encryption. For nonlinear optical encryption, aggressors require $O (N_{2})$ plaintext–ciphertext pairs for the full decryption, which is much more complex than the linear scattering matrix. Therefore, the optical encryption based on nonlinear scattering media is secure as long as our training network is not stolen.

5. CONCLUSION

In conclusion, we develop an image reconstruction method through a nonlinear signal of the scattering medium by using our NSDN. As far as we know, this is the first time to precisely reconstruct image information of FF from nonlinear speckles generated from a nonlinear scattering medium. Further, the proposed NSDN is able to restore the initial information through different sets of diffusers and reconstruct the image of a kind of completely unseen object category. Our approach promises highly stable, large-scale nonlinear information transport through a complex scattering medium. We expect that this technique can be applied to arbitrary image reconstruction process with the nonlinear presence and information encryption.

Category: Nonlinear Optics

Received: Mar. 13, 2024

Accepted: Jul. 7, 2024

Published Online: Sep. 2, 2024

The Author Email: Haigang Liu (liuhaigang@sjtu.edu.cn), Xianfeng Chen (xfchen@sjtu.edu.cn)

DOI:10.1364/PRJ.523728

CSTR:32188.14.PRJ.523728