Target extraction through strong scattering disturbance using characteristic-enhanced pseudo-thermal ghost imaging

Xuanpengfan Zou; Xianwei Huang; Wei Tan; Liyu Zhou; Xiaohui Zhu; Qin Fu; Xiaoqian Liang; Suqin Nan; Yanfeng Bai; Xiquan Fu

doi:10.3788/COL202422.121103

1. Introduction

Scattering media, such as dense fog and biological tissues, are a major hindrance for detecting information, particularly in optical imaging^[1–4]. The light propagation is strongly perturbed due to the inhomogeneity of the scattering medium. In most cases, the scattering medium is either unknown or incompletely characterized^[5–8]. Therefore, the effective strategy for maximizing information recovery must be two-pronged: minimizing interference from the scattering medium and leveraging prior knowledge about the imaging targets^[9–11]. For target extraction, characteristic recognition is a common way, and the prior knowledge of imaging objects can be well utilized by deep learning. However, the target extraction is still affected by environmental disturbances, and the extraction precision is limited.

As is well known, ghost imaging (GI) is an anti-interference imaging technology that obtains target information from the correlation between two correlated beams^[12–21]. Recently, research about GI in scattering media has greatly intensified^[22–28]. Moreover, the operation of the light source has also proven to be useful for obtaining object characteristics because GI is the corresponding imaging technique^[29]. In addition, it is feasible to use kernels in convolutional neural networks to adjust the second-order correlation of the speckle patterns, so as to enhance imaging quality^[30]. Based on these research insights, if the light source in GI includes object (one cares about) characteristics, it is beneficial for estimating if there exists the target in scattering media, thus extracting it.

In this paper, we propose the characteristic GI scheme for target extraction based on enhancing the characteristic response of the light source with deep learning. In our scheme, the light source is trained to contain object characteristics by the U-Net neural network and then is used for target extraction in GI. The numerical and experimental results validate our approach, demonstrating successful recognition of the target we care about with minimal measurements in complex imaging environments. The results present potential applications of target extraction through strong scattering interference.

2. Principle and Methods

The network structure is U-Net, which is a characteristic fusion network, as shown in Fig. 1(b). The network consists of two parts: the left side can be considered as an encoder for the characteristic extraction of training images, and the right side can be seen as a decoder for the characteristic matching of the output label. The encoder has five sub-modules, each of which contains two $3 \times 3$ convolutional layers. After each sub-module, there is a lower sampling layer realized by a $2 \times 2$ max pool. The resolution of the input image is $128 \times 128$ . The decoder contains five sub-modules with consistent output resolution to the input image. The network also uses skip connections, which connect the upsampled results with the input results of the same resolution in the decoder. It is noted that the imaging resolution is closely related to the number of sub-models in the process of training with the U-Net network. To get a higher resolution, the number of sub-models must be increased to better extract characteristics. If the resolution is increased without increasing the number of sub-models, the characteristic extraction will be inadequate, resulting in the loss of target characteristics and affecting the accuracy of the model.

Figure 1.(a) Schematic of the experimental setup. L1–L3 are the optical lenses, and GG is the rotary ground glass. (b) Network architecture: U-Net. The U-Net consists of a characteristic extraction path (left side) and a characteristic matching path (right side) to introduce target characteristics into the light patterns.

Download full size

View all figures

Figure 1(a) shows the experimental setup of the GI system. Traditionally, the speckle pattern $I_{r i} (x, y)$ is projected by a projector. After the beam expansion by lenses L1 (focal length F₁ = 100 mm) and L2 (focal lengths F₂ = 150 mm), the speckle pattern interacts with the object $O (x, y)$ . Then, the speckle field is detected by a bucket detector after a collecting lens L3 (focal length F₃ = 100 mm) and can be expressed as $B_{i} = \sum_{x, y} I_{r i} (x, y) O (x, y) .$ (1)

For traditional GI, the image of the object can be reconstructed by an intensity fluctuation correlation between $I_{r i} (x, y)$ and bucket signals $B_{i}$ ^[18], $G (x, y) = ⟨ Δ I_{r i} (x, y) Δ B_{i} ⟩ = ⟨ I_{r i} (x, y) B_{i} ⟩ - ⟨ I_{r i} (x, y) ⟩ ⟨ B_{i} ⟩,$ (2)where the operator $⟨ \dots ⟩$ is the ensemble average. Since the self-correlation $g^{(2)} (x, y)$ of the light field $I_{r i} (x, y)$ can be expressed as^[31] $g^{(2)} (x, y) = ⟨ I_{r i} (x_{1}, y_{1}) I_{r i} (x_{2}, y_{2}) ⟩ - ⟨ I_{r i} (x_{1}, y_{1}) ⟩ ⟨ I_{r i} (x_{2}, y_{2}) ⟩,$ (3)the reconstructed image can be further rewritten as $G (x, y) = \frac{1}{α} O (x, y) \otimes g^{(2)} (x, y),$ (4)where $α = \sum_{x, y} O (x, y)$ , and $\otimes$ represents the convolution. Obviously, GI is the convolution between the self-correlation of the light field and the object.

To enhance the characteristic response of the illuminating pseudo-thermal light, the training model is generated in the network using the training object and transmitted light that passes through the object; then, the patterns are input into the training model to generate a new light source $I_{r i}^{'} (x, y)$ , and its self-correlation can be expressed as^[32–36] $g^{' (2)} (x, y) = g^{(2)} (x, y) + \sum O_{i}^{'} (x, y),$ (5)where $\sum O_{i}^{'} (x, y)$ is the characteristic set related to the training object obtained by the neural network. Each subset $O_{i}^{'} (x, y)$ is related to different characteristics of the training object. In the characteristic extraction of the deep learning network, the characteristic set $\sum O_{i}^{'} (x, y)$ is the convolution of the kernel function, and the input, that is, the characteristic set, is the deep correlations within the output objects based on the kernel function. Thus, each subset $O_{i}^{'} (x, y)$ can be considered as an independent part of the characteristics, such as an independent spatial or spectral component. The self-correlation process can be understood as a re-distribution of the spatial correlation from different characteristic subsets. For the trained light field, the self-correlation enhances the characteristic response of the target, which indicates that the characteristic information of the target is contained to the light field.

In our proposed GI model with deep learning, the patterns after network training are used as the illumination source, and the reconstructed image can be expressed as $G^{'} (x, y) = O (x, y) \otimes g^{' (2)} (x, y) = O (x, y) \otimes g^{(2)} (x, y) + O (x, y) \otimes \sum O_{i}^{'} (x, y) = G (x, y) + O (x, y) \otimes \sum O_{i}^{'} (x, y) .$ (6)

From Eq. (6), the ghost image $G^{'} (x, y)$ can be considered the superposition of two parts: the result of the original GI $G (x, y)$ and the convolution between the training object characteristic set and imaging object. In practice, $G (x, y)$ contains less object information due to low sampling, resulting in imaging results similar to the form of the speckle field, which produces an interference effect similar to the noise and affects target recognition. However, in our method, the final imaging result includes both the results of traditional ghost image $G (x, y)$ and the convolution term $O (x, y) \otimes \sum O_{i}^{'} (x, y)$ . When the target cannot be recognized by traditional GI, $G^{'} (x, y)$ relies on the convolution part, which contains the characteristics of the training object $\sum O_{i}^{'} (x, y)$ . Therefore, the convolution part will highlight the characteristics similar to the training target to achieve target extraction. Thus, if the imaging target is similar to the training one, the convolution can highlight the target in the reconstructed result.

Here, it is natural to ask if the pseudo-thermal light that has been trained is suitable to realize GI. Figure 2 presents the intensity fluctuation correlation distributions (self-correlations) of the speckle under different training objects [see Figs. 2(a1)–2(d1)] and training epochs. The measurements are 500. Figure 2(e) illustrates the second-order coherence ( $g^{(2)}$ ) for different epochs when using the pig as the training object, and Fig. 2(f) demonstrates $g^{(2)}$ for different training objects with 20 training epochs. In Figs. 2(a2)–2(d2), the backgrounds contain shadows of the corresponding training objects, which indicates that the specific object characteristics within the light source can be enhanced. The phenomenon can be explained from Eq. (5). The self-correlation of the trained light field consists of the self-correlation of the original light field and the characteristic set related to the training object. However, since the distribution of the characteristic set is not concentrated in the whole light field, it behaves as a shadow with less intensity than the light field. In addition, the characteristics are discrete and distributed in the process of deep learning. Therefore, in the characteristic fusion of the final light field, the fusion of different characteristics to the light field is not uniform, and some characteristics will be greatly enhanced. As shown in Fig. 2(c2), the bow on the cat’s head can be regarded as an independent spatial distribution characteristic, which appears multiple times throughout the self-correlation, indicating that the characteristic blends well with the light field.

Figure 2.Intensity correlation distributions of the pseudothermal light under different training objects and training epochs. (a1)–(d1) Different training objects. (a2)–(d2) Intensity correlation distributions of the light field under the corresponding training objects. g⁽²⁾ of the trained speckle patterns with different (e) training epochs and (f) training objects, respectively.

Download full size

View all figures

It is shown in Figs. 2(e) and 2(f) that $g^{(2)}$ after training is related to the training targets and periods. With the increase of training periods, the peak value of $g^{(2)}$ first increases and then decreases because the insufficient and overfitting training process leads to insufficient or oversaturation of characteristic enhancement in the light field. For the training targets, there is a large difference in $g^{(2)}$ for different training targets. The peak value of $g^{(2)}$ is the largest when the training target is the dog, while the peak value of $g^{(2)}$ is the smallest when the training target is the pig. This difference is caused by two aspects: on the one hand, the characteristic matching degree of different training targets is different during the training process so that the completion degree of their respective characteristic extraction is not in the same situation. On the other hand, the fusion degree of different characteristics and the light field is also different, resulting in different characteristic enhancements in the light field during the process of light field characteristic fusion. Despite the differences in $g^{(2)}$ , the peak values consistently remain above 1, i.e., the light field after network training is also suitable for GI.

To quantitatively measure the imaging quality, the signal-to-noise ratio (SNR) is used in the following discussion^[37]: $SNR = \frac{⟨ G_{s} ⟩ - ⟨ G_{n} ⟩}{\sqrt{0.5 (D_{s} + D_{n})}},$ (7)where $G_{s}$ and $G_{n}$ are the ensemble averages of the reconstructed image signal with one and zero transmission, respectively. $D_{s}$ and $D_{n}$ are the corresponding variances.

3. Results and Discussion

3.1. Target extraction with the characteristic-enhanced light field

We first demonstrate the ability of target extraction from complex scenarios, thus verifying the above theoretical model and analysis. The simulation and experiment are shown in Fig. 3. Here, we consider separating the overlapping targets using the trained light field. In the experiment, the gain of the bucket detector is 70 dB, and the time interval of the projector is 0.1 s. Three different objects (cat, dog, and pig) etched on the $40 mm \times 40 mm$ aluminum plate are selected as training objects [Figs. 3(b1)–3(d1)]. The imaging scene is the superposition of three objects, as shown in Fig. 3(a1). The imaging quality is significantly bad due to low measurements (500 measurements) and occlusion disturbances between targets in traditional GI, resulting in the inability to recognize targets effectively, as depicted in Figs. 3(a2) and 3(a3).

Figure 3.(a1) GI results under pseudothermal light (first column) and light sources with different training characteristics (second to fourth columns). (a1) The imaging object and (b1)–(d1) different training objects. (a2)–(d2) The simulation results. (a3)–(d3) The corresponding experimental results.

Download full size

View all figures

Since our imaging target comprises a superposition of multiple distinct training objects, each individual object [Figs. 3(b1)–3(d1)] is introduced into the deep learning network to obtain its corresponding characteristic-enhanced light field; subsequently, these trained light fields are used for GI. The results show that the proposed method can effectively separate different objects, and the outlines of the real objects are obviously presented in both simulation [Figs. 3(b2)–3(d2)] and experiment [Figs. 3(b3)–3(d3)], leading to a remarkable improvement of target recognition ability. Comparing the imaging quality of different reconstructed images, ghost images with the dog have a significantly larger SNR than those with the cat and pig. The phenomenon indicates that the imaging quality is closely related to the characteristic response in the light source, which can be explained from $g^{(2)}$ in Fig. 2(f). The peak value of $g^{(2)}$ with the training object being the dog is higher than those with other animals. As described in Eq. (6), the ghost image can be considered a superposition of the original imaging result and the characteristic convolution between the training object and the imaging object. Therefore, when the target cannot be recognized by traditional GI, the convolution enhances the corresponding target information in the imaging results. In other words, similar characteristics to the training target can be effectively identified.

In addition, it is noticed that the bow characteristic on the cat head performs better than the whole cat, which is related to the enhancement intensity of the target characteristic in the light field. As can be seen from the results of the light field self-correlation in Fig. 2(c2), the bow on the cat head has a lot of shadows in the self-correlation, which not only indicates that the light field and the bow characteristic are integrated very well but also reveals that the bow characteristic in the light field has a larger enhancement. Therefore, the bow characteristic in the ghost image is clearer when compared with the other parts of the cat.

In practical application, there may exist multiple imaging objects with the same characteristics that need to be extracted in complex scenarios. Therefore, it is very important to recognize simultaneously these targets. Figure 4 shows the multi-target scenarios (the superposition of some animals), and there is more than one animal of each type. Here, we want to know if there exist any chickens in these scenarios and how many there are. The goal cannot be achieved by traditional GI [see Figs. 4(a1) and 4(a3)], while it is interesting to find that the target can be extracted using the characteristic-enhanced light field [see Figs. 4(a1)–4(a4)] when the training object is a chicken [the upper left parts of Figs. 4(a) and 4(b)]. In more complex scenarios with multiple targets of the same species that need to be identified, the traditional GI also fails [see Figs. 4(b1) and 4(b3)]. Interestingly, the targets sharing an identical shape with the training object can still be recognized using the trained light field, and the corresponding quantities are also clearly discernible, as depicted in Figs. 4(b2) and 4(b4). The results demonstrate that our scheme exhibits a remarkable ability for multiple-target extraction when the targets own similar characteristics. In addition, although the chicken has the best imaging quality under single-target extraction, some characteristics of the chicken are not obvious, such as the chicken comb. The reason is that the chicken and the rabbit have some similar characteristics, such as the chicken comb and the rabbit ears, which cause some interference with the target extraction. Therefore, similar targets have shortcomings in the part of characteristic extraction, but the targets can still be recognized.

Figure 4.Single- and multiple-target extraction results (500 measurements) in a complex scenario under a specific training target [the upper left parts of (a) and (b)]. (a1), (a3) and (b1), (b3) The results under pseudo-thermal light. (a2), (a4) and (b2), (b4) The results under the trained light. (a1), (a2) and (b1), (b2) Simulation results. (a3), (a4) and (b3), (b4) Experimental results.

Download full size

View all figures

3.2. Target extraction through a strong scattering environment and its application in biomedical imaging

From the above analysis, our method can be used for single- and multiple-target extraction in complex scenarios. In practical imaging scenarios, the extraction ability may be subject to scattering interference, so it is important to investigate the target extraction ability in scattering environments. Ground glass is generally considered a surface scattering medium, so we place the rotating ground glasses in front of and behind the object to simulate a strong scattering environment in our experiment, and the results are shown in Fig. 5. Due to scattering interference, traditional GI fails to recognize targets for single-target extraction [Figs. 5(a2) and 5(b2)], and distinguishing the multiple-target is even more challenging [Fig. 5(c2)]. However, our method presents better target extraction ability, and the SNR is further improved [Figs. 5(d2)–5(h2)] when compared with traditional GI. For example, Figs. 5(c2) and 5(h2) show the multiple-target extraction results under the pseudo-thermal light source and characteristic-enhanced light source, respectively. It is difficult to distinguish the target using traditional GI, while the trained light field can efficiently obtain the corresponding target we are concerned about and reduce the interference of the scattering environment caused by rotating ground glasses. Furthermore, the presence of scattering only slightly degrades the SNR and does not impede target extraction in comparison to results obtained without the scattering environment [Figs. 4(b4) and 5(h2)]. In other words, enhancing the characteristic response of the light source effectively mitigates the impact of scattering at a low sampling rate.

Figure 5.The experimental results under different imaging scenarios in a strong scattering environment (500 measurements). (a1)–(h1) The corresponding imaging objects and different training objects (the upper left part of each picture). The imaging results under pseudo-thermal light [(a2)–(c2)] and enhanced characteristic lights [(d2)–(h2)], respectively.

Download full size

View all figures

Then, we consider a practical scattering environment, such as biomedical tissue. In this process, it is inevitable to encounter the influences of scattering environments such as muscles and blood vessels. Taking bone imaging as an example, X-ray is traditionally used to solve this problem, which can directly obtain bone information through soft tissues. Then, we attempt to extract bone information using our scheme with the trained pseudo-thermal light. Here, the imaging objects are real zebrafish and crucian carp in the experiments, as shown in Figs. 6(b1) and 6(b2), and the training object is X-ray photographed fish bones shown in Figs. 6(a1) and 6(a2). The area in the dotted box is the imaging area, and the corresponding results are presented in Figs. 6(c1) and 6(c2). It is shown that GI with the trained light fields is sufficient for recognizing the fish bones at low measurements of 500. For the small zebrafish, the spine of almost the entire body is obtained in Fig. 6(c1), and the details of the spine can also be identified [see the upper right corner of Fig. 6(c1)]. For the larger crucian carp, the extraction ability of small bones (such as ribs) is not as good as that of the small fish due to the thicker fish body. However, the spine can also be effectively identified, which verifies the target extraction ability of our method in a practical scattering scenario.

Figure 6.Experimental results of fish bone extraction (500 measurements). (a1), (a2) X-ray photos of zebrafish and crucian carp bones. (b1), (b2) Real photos of zebrafish and crucian carp. (c1), (c2) Results using our imaging method.

Download full size

View all figures

We also notice the degradation of image quality with the increase of the fish thickness, the reason being that an increase of the fish thickness reduces the signal intensity received by the bucket detector. Therefore, it is necessary to investigate the influence of the detected signal intensity and fish thickness on characteristic extraction, and the results are plotted in Fig. 7. Here, the detected SNR ( $S / R$ ) $S / R = (D_{s} - D_{n}) / D_{n}$ is used to describe the signal intensity detected by the bucket detector with $D_{s}$ being the detected value in the presence of light and $D_{n}$ being the noise intensity in the absence of light. A smaller $S / R$ corresponds to a weaker signal received by the bucket detector.

Figure 7.Experimental results of fish bone extraction (500 measurements). (a1)–(a3) Real photos of zebrafish and crucian carp with different thicknesses. (b1)–(f1), (b2)–(f2), and (b3)–(f3) The results from our method under different S/R values.

Download full size

View all figures

To better present the experimental results, we conduct multiple experiments and set the light power in the object plane to 50, 40, 30, 20, and 10 µW to implement the experiments. The effect of different fish thicknesses on bone extraction is also discussed, and the thicknesses of fish are selected as 0.8, 1.4, and 2.1 mm, as shown in Figs. 7(a1)–7(a3). A larger thickness corresponds to a smaller $S / R$ . The signal received by the bucket detector weakens when the light intensity decreases or the fish thickness increases. Therefore, the extraction ability gradually deteriorates when the light intensity varies from 50 to 10 µW and the fish thickness changes from 0.8 to 2.1 mm. It is noted that the fish bone is no longer recognized by the characteristic-enhanced light source when the $S / R$ is lower than approximately 0.8. In other words, the target can still be recognized when the signal is slightly weaker than the noise. The scattering environment closely relates to real imaging scenarios, so our proposed imaging scheme can help target extraction in a strong scattering environment.

Here, it should be emphasized that our proposed method is a pre-processing of the light source, not a post-processing of the imaging result^[38,39] or the object light^[40]. Existing deep learning GI schemes can indeed eliminate the interference and achieve target extraction, but there is also a problem: the training process of deep learning needs a large number of data sets and training periods to ensure the stability of target recognition. Therefore, due to the processing of target information, the existing deep learning GI generally needs larger training sets (thousands of images)^[38], longer training periods (more than 50 periods)^[40], and higher time consumption^[38,40]. However, our method does not focus on the data processing of the detected information, it only enhances the target characteristic response in the light field to realize the target extraction during the imaging process. The training process requires fewer resources: smaller training sets (1000 images), shorter training periods (10–20 periods), less time consumption (only a few minutes), and fewer measurements (500 samples), which is beneficial for real-time imaging.

According to the above analysis, our scheme can be utilized to recognize the targets that own identical characteristics to the training object from a complex scenario. Moreover, the model can eliminate the effect of strong scattering by enhancing the target characteristic response of the light source. Note that the reported studies^[27,28] on object identification based on GI mainly depend on the post-processing of detected signals, which increases the time consumption of image reconstruction. Our method is beneficial for real-time imaging without requiring the post-processing of target signals. Moreover, previous speckle processing^[30] can only improve the imaging quality of the object at a low sampling rate and cannot separate the target from multiple interference targets, which demonstrates the potential for target extraction under complex scenarios.

4. Conclusion

A characteristic imaging scheme based on deep learning to enhance the characteristic response of pseudothermal light in GI has been proposed for target extraction. Since the characteristic of the training target is contained in the new light source, the target can be recognized under low measurements even when the imaging object is covered by occlusions. The simulation and experimental results verify the single- and multiple-target extraction ability of our scheme when there exist interference targets and strong scattering. Our proposed scheme may be a promising target extraction method and has potential applications in complex application scenarios.

Category: Imaging Systems and Image Processing

Received: Jan. 4, 2024

Accepted: Jun. 24, 2024

Posted: Jun. 24, 2024

Published Online: Dec. 31, 2024

The Author Email: Yanfeng Bai (yfbai@hnu.edu.cn), Xiquan Fu (fuxq@hnu.edu.cn)

DOI:10.3788/COL202422.121103

CSTR:32184.14.COL202422.121103