Computational ghost imaging with compressed sensing based on a convolutional neural network

Hao Zhang; Deyang Duan

doi:10.3788/COL202119.101101

1. Introduction

Ghost imaging is an indirect imaging technique based on quantum properties (e.g., quantum entanglement or intensity correlation) of the light field^[1–3]. Compared to conventional optical imaging techniques, ghost imaging requires two light beams: a reference light beam, which never illuminates the object and is directly measured by a detector with a spatial resolution (e.g., a charge-coupled device) and an object light beam, which, after illuminating the object, is measured by a bucket detector with no spatial resolution. By correlating the photocurrents from the two detectors, the ghost image is retrieved. Previous works show that ghost imaging has potential applications in remote sensing^[4,5], industrial imaging^[6,7], medical imaging^[8–10], and super-resolution imaging^[11,12]. However, conventional ghost imaging requires two optical paths, which severely limits its application. Fortunately, Shapiro creatively introduced the concept of computational ghost imaging (CGI) in 2008^[13]. In the CGI setup, the idle light is obtained by calculation, so the reference light path is omitted in the experimental apparatus^[14]. Compared with conventional ghost imaging, CGI is more suitable for application in remote sensing, radar, and other fields.

After more than 10 years, CGI theory and experiments have matured. However, CGI is still in the laboratory stage. One of the critical problems is that the image quality cannot meet practical applications. Generally, to produce a clear image, conventional CGI, including conventional ghost imaging, takes approximately tens of thousands of sets of data, which obviously cannot meet the requirements of practical application, especially those of moving target imaging. How to improve the image quality of ghost imaging is one of the key factors for realizing its application. Compressed sensing (CS)^[15–18] and deep learning (DL)^[19–22] greatly improve the image quality, but there is still a gap compared with the quality of classical optical imaging.

In this article, we propose a novel CGI scheme with CS based on a convolutional neural network (CNN) to improve the image quality. The setup is based on a conventional CGI experimental apparatus. First, the data collected by the CGI device are compressed by the conventional CS algorithm; then, the processed data is trained to reconstruct the ghost image. This scheme combines the advantages of CS with a low sampling rate and a CNN for accurate image reconstruction. Theoretical and experimental results show that this scheme is significantly better than conventional CS and a conventional DL algorithm with a CNN under the same amount of data.

2. Theory

We use a conventional CGI experimental device in our work. The setup is shown in Fig. 1. In the setup, a quasi-monochromatic laser illuminates an object $T (ρ)$ , and the reflected light carrying the object information is modulated by a spatial light modulator. A bucket detector collects the modulated light $E_{d i} (ρ, t)$ . Correspondingly, the calculated light $E_{c i} (ρ^{'}, t)$ can be obtained by diffraction theory. The object image can be reconstructed by correlating the signal output by the bucket detector and calculated signal^[23–25], i.e., $G (ρ, ρ^{'}) = \frac{1}{n} \sum_{i = 1}^{n} (〈 {| E_{d i} (ρ, t) |}^{2} {| E_{c i} (ρ^{'}, t) |}^{2} 〉 - 〈 {| E_{d i} (ρ, t) |}^{2} 〉 〈 {| E_{c i} (ρ^{'}, t) |}^{2} 〉),$ (1)where $〈 \cdot 〉$ stands for an ensemble average. The subscript $i = 1, 2, \dots, n$ denotes the $i$ th measurement, and $n$ denotes the total number of measurements. For simplicity, the object function $T (ρ)$ is contained in $E_{d i} (ρ, t)$ .

Figure 1.Setup of the CGI system with CS-CNN. SLM, spatial light modulator; BD, bucket detector.

Download full size

View all figures

The flow chart of the CS-CNN is shown in Fig. 2. In the following, we briefly introduce the process of this algorithm. The algorithm mainly consists of three parts: (i) a conventional CS program to compress the data collected by the CGI device; (ii) a conventional CGI process program; and (iii) a 10-layer CNN constructed for the training data.

Figure 2.Network structure of the proposed CS-CNN.

Download full size

View all figures

In the conventional CGI device, a set of data ( $n$ ) is measured by a bucket detector. Correspondingly, according to the diffraction theory of light, the distribution of the idle light field in the object plane can be obtained. Thus, we obtain $n$ $200 \times 200$ data points. Each data point is divided into $20 \times 20$ blocks without overlapping. According to CS theory^[15,16], the random Gaussian matrix is used to process the data. The rows of $20 \times 20$ data blocks are arranged into a column vector to obtain a 400-dimensional column vector. In this article, the measurement rate is $MR = 0.25$ , and thus the size of the measurement matrix is $100 \times 400$ . Finally, a 100-dimensional measurement vector is obtained. The above process can be expressed as $y = ϕ x,$ (2)where $ϕ \in R^{M \times N} (M ≪ N)$ is the measurement basis matrix, $x \in R^{N}$ represents the vectorized image block, and $y \in R^{M}$ is the measurement vector. $N / M$ represents the measurement rate. Following the above steps, we can further compress the data to 50 dimensions.

A new set of data is obtained by processing the above data with a conventional CGI program. Then, a 10-layer CNN is constructed to train the data. Layers 1–4 of the network are stacked autoencoders, and layers 5–10 are convolution layers. The measurement matrix is replaced by a stacked autoencoder, and the input layer is $20 \times 20$ data blocks. All of the rows are arranged into a $400 \times 1$ column vector. If the number of neurons in the first layer is $C$ , the measurement rate is $MR = C / 400$ . The first layer of the network is connected to the column vector $x$ converted from the input image block, and the number of neurons $C$ is set according to different measurement rates. The activation function is a rectified linear unit (ReLU) function, which outputs the $C$ -dimensional column vector $y$ , i.e., $y = T (W_{1} x + b_{1}),$ (3)where $T$ represents the ReLU activation function, $W_{1}$ represents the weight parameter vector of neurons in the first layer, and $b_{1}$ represents the bias of neurons in the first layer.

The second layer of the network is fully connected to the first layer, which has 400 neurons. Take the output $y$ of the first layer as the input, output $x$ , and the activation function is the ReLU function. In the same way, the third layer is fully connected to the second layer with 100 neurons. The fourth layer is fully connected to the third layer with 400 neurons. The initial reconstructed image block vector is rearranged into $20 \times 20$ image blocks according to the original row and column to obtain the preliminary reconstructed image block.

Finally, the CNN is used to reconstruct the image block accurately. The output data of the fourth layer are taken as the input of the fifth layer. In the fifth layer, sixty-four $11 \times 11$ convolution kernels are used to generate sixty-four $10 \times 10$ feature maps. The sixth layer of the network is connected to the fifth layer (a convolution layer), and thirty-two $1 \times 1$ convolution kernels are used to generate thirty-two $20 \times 20$ characteristic graphs. The seventh layer of the network is connected to the sixth layer (a convolution layer), and a $7 \times 7$ convolution kernel is used to generate a $20 \times 20$ feature map. The eighth layer of the network is connected to the seventh layer (a convolution layer), and sixty-four $11 \times 11$ convolution cores are used to generate sixty-four $20 \times 20$ feature maps. The ninth layer of the network is connected to the eighth layer (a convolution layer), and thirty-two $1 \times 1$ convolution kernels are used to generate thirty-two $20 \times 20$ characteristic graphs. The activation function of the above process is an ReLU function. The tenth layer of the network is connected to the ninth layer (a convolution layer). A $7 \times 7$ convolution kernel is used. The number of zeros in the tenth layer (a convolution layer) is three, and the output of the activation function is not used to generate the reconstructed image block of size $20 \times 20$ .

In the DL framework Caffe, the 10-layer network is trained in an unsupervised way, and the loss function is $L ({W}) = \frac{1}{T} \sum_{i = 1}^{T} {‖ F (x_{i}, {W}) - x_{i} ‖}^{2} .$ (4)

The number of input neurons in the first layer is zero, and the number of output neurons in the fourth layer is zero. In the 5th to 10th layers of the network, the initial weight distribution is subject to a Gaussian distribution with a mean of zero and a variance of 0.01. In layers 1–10 of the network, the initial offset values are set to zero. After the deep neural network, the reconstructed image blocks are obtained, then the image blocks are rearranged according to the original row, and the row values are rearranged according to the index.

3. Results

The experimental setup is schematically shown in Fig. 1. A standard monochromatic laser (30 mW, Changchun New Industries Optoelectronics Technology Co., Ltd., MGL-III-532) with wavelength $λ = 532 nm$ illuminates an object (Rubik’s Cube). The light reflected by the object focuses on a two-dimensional amplitude-only ferroelectric liquid crystal spatial light modulator (Meadowlark Optics A512-450-850) with $512 \times 512$ addressable $15 μm \times 15 μm$ pixels through the lens. A bucket detector collects the modulated light. Correspondingly, the reference signal is obtained by MATLAB software. The ghost image is reconstructed by the CS-CNN. In this experiment, the sampling rate is $MR = 0.25$ , and the number of training sets is 1000.

Figure 3 shows a set of experimental results. Figure 3(a1) is the object. Figures 3(a2)–3(a5) represent reconstructed ghost images with different numbers of frames. The results show that the image quality is significantly improved by increasing the number of frames. High-quality ghost images comparable to classical optical imaging can be produced with little data. To quantitatively analyze the quality of the reconstructed image at different frames, the peak signal to noise ratio (PSNR) and structural similarity index (SSIM) are used as our evaluation indexes. As can be seen from Fig. 3(b), despite the number of samples being very small, the reconstructions are still in reasonable quality.

Figure 3.Ghost images reconstructed by CGI with CS-CNN. (a1) Classical image. The numbers of frames in the reconstructed ghost images are (a2) 30, (a3) 50, (a4) 70, and (a5) 90. (b) PSNR and SSIM curves of the reconstructed images with different frame numbers.

Download full size

View all figures

We compare the conventional CS, DL, and CS-CNN CGI algorithms based on the same experimental data in Fig. 4. CGI can not effectively reconstruct the image when the number of frames is less than 100. Consequently, there is no experimental result of CGI in Fig. 4. The conventional CS algorithm and CS-CNN algorithm have the same sampling rate, i.e., $MR = 0.25$ . The DL algorithm and CS-CNN algorithm set the same dataset, i.e., 1000. When the number of samples is very low, Fig. 4 shows that with the same number of frames the image quality obtained by this scheme is the best. The quantitative results (Fig. 5) show that the PSNR of CGI with CS-CNN is on average 28.4% higher than that of CGI with DL under the same reconstructed frame number, and SSIM increases by 93.8% on average^[26].

Figure 4.Detailed comparison between the ghost images reconstructed using the conventional CS algorithm, DL algorithm, and CS-CNN algorithm. The number of frames is (a) 30, (b) 50, (c) 70, and (d) 90.

Download full size

View all figures

Figure 5.PSNR and SSIM curves of reconstructed images of CS, DL, and CS-CNN with different frame numbers.

Download full size

View all figures

4. Summary

In summary, we have proposed a novel method to improve the image quality of CGI. This method combines the advantages of the CS algorithm and CNN algorithm. We analyzed the performance of the conventional CGI, CS, and DL algorithms under the same conditions and observed that our CS-CNN scheme outperforms the other methods, especially when the sampling rate is small. CS based on a CNN is the better CGI method to date. This method provides a promising solution to these challenges that prohibit the use of CGI in practical applications.

Category: Imaging Systems and Image Processing

Received: Jan. 4, 2021

Accepted: Mar. 26, 2021

Posted: Mar. 29, 2021

Published Online: Aug. 16, 2021

The Author Email: Deyang Duan (duandy2015@qfnu.edu.cn)

DOI:10.3788/COL202119.101101

微信扫一扫：分享