Background removal, computational ghost tracking, and follow-up ghost imaging of a moving target

Wenkai Yu; Peizhe Zhang; Ying Yang; Ning Wei

doi:10.3788/COL202523.051101

1. Introduction

Real-time tracking of the moving target is of great importance in many fields, e.g., vehicle monitoring, aircraft entry and exit monitoring, and remote-sensing observation. Traditional target-tracking techniques are mainly based on sequential high-resolution images recorded by pixelated sensors, with the tracking accuracy depending on the image quality and the algorithm performance. By calculating cross-correlation and autocorrelation between different speckle images captured by a pixelated camera, one can track the target in both lateral and axial directions^[1]. Even though it is possible to take many continuous photos with high quality in a short time, this comes with the challenge of massive data processing, which brings great pressure to data storage, transmission, and calculation.

Alternatively, we can use another indirect imaging technology named ghost imaging (GI)^[2–5], which retrieves target images via intensity correlation between modulation patterns and single-pixel (bucket) measurements. It offers more possibilities at invisible wavelengths such as X-ray^[6,7], infrared, and terahertz bands, where high-speed high-resolution cameras are technically unavailable or too expensive. In addition, the configuration can be simplified^[8,9] with the help of a programmable spatial light modulator (SLM). In this computational GI scheme, only one single-pixel (bucket) detector is used, whose photon detection efficiency, sensitivity, and readout speed are much better than those of pixelated array detectors. However, since GI is a static imaging method, it is difficult to acquire sufficient measurements of the moving target, resulting in blurred images. To improve the acquisition speed of successive ghost images, one can increase the illumination refresh frequency^[10], reduce the pixel size of the motion scene, or decrease the number of intraframe measurements with the help of compressive sampling^[11,12], deep learning^[13,14] or super sub-Nyquist^[15] techniques. Moreover, it is found that the motion prior knowledge^[16] and motion estimation^[17] are helpful for low-speed tracking. The target trajectory can be acquired by sequentially matching the patterns^[18] or projection curves^[19] with shifts.

If the displacement information can be extracted directly from single-pixel values, it will save a great deal of time and consumption while improving tracking quality^[20]. Recently, an image-free tracking method^[21,22] was proposed to estimate the trajectory of a homogeneous target (regarded as a mass point) in real time with two coefficients acquired from six Fourier basis patterns per frame per two-dimensional (2D) plane, making full use of the linear phase shift property in the Fourier domain. While in the spatial domain, another target-tracking method^[23,24] was proposed, using geometric moment patterns^[25,26] to track a simple binary target in a pure black background. Soon after, dual-pixel complementary measurements were made to reduce the influence of the measurement noise on ghost tracking^[24,27], but increasing the cost and complexity, and bringing the optical imbalance problem in two arms^[28]. All of the above ghost-tracking methods that can only deal with homogeneous simple targets, are prone to bias in scenes with complex backgrounds, and are sensitive to measurement noise.

In this paper, to solve these issues, we proposed an efficient single-pixel measurement method that can acquire the displacement of a translating complex target in real time, with good robustness to complex backgrounds. By staring at the located area and modulating cake-cutting (CC)-ordered Hadamard basis patterns^[15] that are evenly distributed to each motion frame, the target image can be retrieved from fewer intraframe single-pixel measurements via an optimization algorithm, with a gradually improved quality. We believe that this method will provide new ideas for background removal in ghost tracking and promote its practical applications.

2. Method and Principle

A time-varying scene can be discretized into a series of motion frames, each of which is of p pixel × q pixel. The $τ$ th ( $τ = 1, 2, \dots, z$ ) motion frame $O (i, j, τ)$ during the $τ$ th very short time interval can be regarded as immobile, consisting of a static background part $O_{b} (i, j)$ and a target part $O_{t} (i, j, τ)$ , namely, $O (i, j, τ) = O_{b} (i, j) + O_{t} (i, j, τ)$ . The center-of-gravity pixel positions of the target can be defined as $X (τ) = \frac{\sum_{i} \sum_{j} i O_{t} (i, j, τ)}{\sum_{i} \sum_{j} O_{t} (i, j, τ)}$ and $Y (τ) = \frac{\sum_{i} \sum_{j} j O_{t} (i, j, τ)}{\sum_{i} \sum_{j} O_{t} (i, j, τ)}$ , which are rotation invariant. As we know, the geometric moment $m_{i j}$ of the 2D motion scene can be written as $m_{α β} = \sum_{i} \sum_{j} i^{α} j^{β} O (i, j)$ _; thus, the denominator and numerator of the center-of-gravity formula are the zero-order and first-order geometric moments, $m_{00}$ and $m_{10}$ (or $m_{01}$ ). Next, we will rederive the ghost-tracking formula starting from the intrinsic center-of-gravity formula.

Here, we use the same patterns as applied in geometric moment-based methods: all-one pattern $P_{o}$ , horizontal asymptotic gradient pattern $P_{h} = k_{h} i$ , and vertical asymptotic gradient pattern $P_{v} = k_{v} j$ , where $k_{h}$ and $k_{v}$ are the constants, and $i = 1, 2, \dots, q$ , $j = 1, 2, \dots, p$ . Before the target enters the scene, we measure the single-pixel (bucket) values $S$ with respect to $P_{o}$ , $P_{h}$ , and $P_{v}$ , $S_{o}^{0} = \sum_{i} \sum_{j} O_{b} (i, j),$ (1) $S_{h}^{0} = \sum_{i} \sum_{j} k_{h} i O_{b} (i, j),$ (2) $S_{v}^{0} = \sum_{i} \sum_{j} k_{v} j O_{b} (i, j) .$ (3)

Since the target is rigid in most circumstances, when the target initially enters the scene, we only need to measure the single-pixel (bucket) value once using $P_{o}$ , $S_{o}^{τ} = S_{o}^{1} = \sum_{i} \sum_{j} O (i, j, 1) = S_{o}^{0} + \sum_{i} \sum_{j} O_{t} (i, j, 1) .$ (4)

According to Eqs. (1) and (4), the ratio of target intensity and the background can be written as $R = \frac{\sum_{i} \sum_{j} O_{t} (i, j, τ)}{\sum_{i} \sum_{j} O_{b} (i, j)} = \frac{S_{o}^{1} - S_{o}^{0}}{S_{o}^{0}},$ (5)which is invariant in each motion frame. For scenes with known pixel sizes, we can use this ratio to derive the pixel size $μ \times ν$ of the approximate rectangular area occupied by the target, which is slightly larger than $R (p \times q)$ .

Then, for each motion frame, we have single-pixel (bucket) values using two asymptotic gradient patterns, $S_{h}^{τ} = \sum_{i} \sum_{j} k_{h} i O (i, j, τ) = S_{h}^{0} + \sum_{i} \sum_{j} k_{h} i O_{t} (i, j, τ),$ (6) $S_{v}^{τ} = \sum_{i} \sum_{j} k_{v} j O (i, j, τ) = S_{v}^{0} + \sum_{i} \sum_{j} k_{v} j O_{t} (i, j, τ) .$ (7)

Assume that there is no overlap between the target and the background, and according to Eqs. (6) and (7), in the current motion frame, we can calculate the target’s absolute displacement, $X (τ) = \frac{\sum_{i} \sum_{j} i O_{t} (i, j, τ)}{\sum_{i} \sum_{j} O_{t} (i, j, τ)} = \frac{1}{k_{h}} \cdot \frac{S_{h}^{τ} - S_{h}^{0}}{S_{o}^{1} - S_{o}^{0}},$ (8) $Y (τ) = \frac{\sum_{i} \sum_{j} j O_{t} (i, j, τ)}{\sum_{i} \sum_{j} O_{t} (i, j, τ)} = \frac{1}{k_{v}} \cdot \frac{S_{v}^{τ} - S_{v}^{0}}{S_{o}^{1} - S_{o}^{0}},$ (9)and relative displacement (assuming $S_{o}^{τ} \approx S_{o}^{τ - Δ τ}$ ), $Δ X (τ) = X (τ) - X (τ - Δ τ) \approx \frac{1}{k_{h}} \cdot \frac{S_{h}^{τ} - S_{h}^{τ - Δ τ}}{S_{o}^{τ} - S_{o}^{0}},$ (10) $Δ Y (τ) = Y (τ) - Y (τ - Δ τ) \approx \frac{1}{k_{v}} \cdot \frac{S_{v}^{τ} - S_{v}^{τ - Δ τ}}{S_{o}^{τ} - S_{o}^{0}} .$ (11)

With the estimated target pixel size and center-of-gravity positions, we utilize optimally CC-ordered Hadamard basis patterns^[15] that are evenly distributed to each motion frame and only modulated in the staring target region. The CC patterns are generated by sorting the Hadamard basis patterns in an ascending order of the number of 2D connected regions of these patterns. Using CC patterns, the low-frequency sampling is done first; then it gradually proceeds to high-frequency sampling, which fits quite well with the idea of gradual imaging^[20,29] (i.e., the image gradually becomes clearer from a blurred outline). Finally, a TVAL3 solver^[30] is applied here for real-time gradual image reconstruction. Here, single-pixel imaging reconstruction refers to the acquisition of image information by solving a system of linear equations between modulation patterns and single-pixel values.

3. Simulation Results

In simulation, we designed a motion scene [see Fig. 1(a1)], which was a car moving in a complex background. We use the green arrows to indicate the direction in which the car is moving, and the red dots to denote its center-of-gravity positions. Here, $k_{h} = k_{v} = 1$ . The recovered trajectory is given in Fig. 1(a2), which perfectly coincided with the real one, indicating that our method could track the moving target in the complex background.

Figure 1.Simulation results of background removal, computational ghost tracking, and follow-up ghost imaging. (a1) Schematic diagram of the motion scene; (a2) and (b1)–(b12) recovered trajectory and imaging results of the moving target (occupying 32 pixel × 32 pixel in the scene of 400 pixel × 400 pixel) using our method; (c1)–(c2) performance curves using different numbers of imaging patterns per position.

Download full size

View all figures

Figure 2.Comparisons of target trajectory recovery results in the absence and presence of background. (a1) and (a2)–(a4) Motion scenes with pure black background and with complex background, respectively; (b1)–(b4), (c1)–(c4), and (d1)–(d4) target trajectories recovered by different methods.

Download full size

View all figures

In Figs. 1(b1)–1(b12), each row gives results at the first, third, fifth, and tenth motion frames, respectively, using different numbers of imaging patterns. On the right side of each subfigure, the corresponding sampling ratio (SR), the calculated peak signal-to-noise ratio (PSNR)^[15], and the mean structural similarity (MSSIM)^[31] values were listed. The corresponding PSNR and MSSIM curves are given in Figs. 1(c1) and 1(c2).

It can be seen from Figs. 1(b1)–1(b12) and 1(c1)–1(c2) that as the target moved, more imaging patterns were involved, which improved the imaging quality. On the other hand, the more imaging patterns were used in each motion frame, the clearer the final reconstructed image would be. Since only two or three modulated patterns were needed in each motion frame to extract the absolute or relative displacement, and only a few evenly allocated CC-ordered Hadamard basis patterns were required for imaging, this allowed for more efficient measurement as long as the modulation frequency was high enough.

We compared our method with two mainstream image-free ghost-tracking methods. The first one was the tracking method based on geometric moment^[23,24]; the second one was based on the Fourier three-step phase shifting^[21,22], in which six Fourier basis patterns were used for each 2D plane to track the target. The simulation comparison results are presented in Fig. 2. When there was no background in the motion scene and only a car was moving, as shown in Fig. 2(a1), our method could perfectly recover the moving trajectory [see Fig. 2(b1)], while the reconstructed trajectories of the other two methods were all deviated to some extent [see Figs. 2(c1) and 2(d1)]. This indicated that the latter two methods were not suitable for the target with a complex structure because they both treated the target as a simple homogeneous target with a single gray value, without considering that the target might have a spatial distribution in intensity. Then, we further added some background elements to the motion scene, as shown in Figs. 2(a2)–2(a4); the proposed method performed well in tracking the target in real time in the complex background, as shown in Figs. 2(b2)–2(b4). While in geometric-moment-based tracking results, there existed large scaling deviations, as shown in Figs. 2(c2)–2(c4). The Fourier three-step phase-shifting tracking results showed relatively slight deviations [see Figs. 2(d2)–2(d4)] compared to the former because it took the background into account.

We can conclude from the simulation results that the proposed method removes the effect of background very well and can be applied to the moving target with a complex structure compared to the other two approaches. As the trajectory is extracted more accurately, it is also easier to perform subsequent follow-up GI.

4. Experimental Results

The schematic diagram of our method is given in Fig. 3. The light emitted from a thermal light source (Thorlabs SLS201L/M, with wavelengths ranging from 300 to 2600 nm, a bulb electrical power of 9 W, and an output power stability of 0.05%) was collimated via a beam expander, and attenuated by the neutral density filters. It was then projected onto the first digital micromirror device (DMD), which was encoded with tracking patterns and imaging patterns to perform spatial light modulation on the incident light. To facilitate accurate control of the pixelated movement of the moving target, we used the second DMD to display the motion scenes, which is a regular practice in ghost tracking and imaging experiments^{[13,20,24,32]} to demonstrate methodological feasibility and quantitatively assess errors. The DMDs were of 17.78 mm, 768 pixel × 1024 pixel, and their maximum modulation frequencies were all 32,552 Hz. Once the DMD started working, all pixels were flipped up. When we selected a working area on the DMD, we just needed to set the pixels outside the area to zeros (also in operation); thus, the peak frequency can always be achieved. Here, the structured light passed through an imaging lens and was projected onto the second DMD. Then, the total intensity of the displayed motion scene overlapped with the modulated light field was recorded by a high-speed counter-type Hamamatsu H10682-210 photomultiplier tube (PMT) (acting as a single-pixel detector).

Figure 3.Experimental setup of background removal, computational ghost tracking, and follow-up ghost imaging. The collimated thermal light is modulated by the first DMD using tracking and imaging patterns, then projected onto the second DMD (encoded with the moving scenes), and the total intensity is collected by a PMT (acting as a single-pixel detector).

Download full size

View all figures

Since each micromirror of the DMD was orientated $+ 12$ with respect to the normal direction of the working plane, corresponding to two states (1 or 0), it meant that only binary 0 and 1 patterns could be loaded onto the DMD. To realize precise gray-scale modulation, we used the pulse width modulation (PWM) technique. Generally, the gray values ranged from 0 to 255; thus, 255 frames of binary 0-1 patterns were required by the PWM method to display asymptotic gradient patterns. For example, assuming the gray value of one pixel in the gray-scale pattern was 150, then 150 of the 255 frames need to display “1” at this pixel, and the remaining 105 frames need to display “0” at this pixel. The PMT continued to integrate during the modulation period of these 255 patterns and recorded a total photon count.

As shown at the bottom of Fig. 3, the modulation patterns were arranged as follows. Before the target entered the scene (i.e., initial scene), a set of $P_{o}$ , $P_{h}$ , and $P_{v}$ were modulated once or a few times for average multiple measurements. After the target entered the scene, we only needed to modulate $P_{0}$ once again. The above four patterns were used for obtaining a priori information about background, which was feasible. Then, the patterns $P_{h}$ and $P_{v}$ and a few CC patterns were regarded as a group to be modulated in each motion frame. The first six patterns were sufficient to acquire the first center-of-gravity position $(i, j)$ , according to which the imaging patterns in the following each group could be generated in real time. The number of imaging patterns in each group could be adjusted according to the moving speed of the target, at least one frame per group.

The experimental results are presented in Fig. 4. The same motion scenes as given in Fig. 1(a1) were displayed on the second DMD [see Fig. 4(a)]. The recorded photon counts are plotted in Figs. 4(b1) and 4(c1). Due to the unavoidable experimental measurement noise, the positioning based on these counts would be out of alignment, as shown in Figs. 4(b2) and 4(c2), which would further affect the quality of subsequent follow-up GI. Therefore, we added a correction step to reduce the influence of the experimental noise. To be specific, we could assume that there was a minimum unit-step size, and any step was an integer multiple of this unit-step. Then, we compared the multiple relationships between $x$ (or $y$ ) obtained via calculation and the minimum unit-step, rounded to this multiple, and then turned the integer multiple back to the corresponding specific pixel coordinates, to achieve the effect of data correction. This is another innovative measure of this work. Here, we set the minimum unit-step on the $x$ axis and $y$ axis to 66. The maximum step in this experiment was that the target moved two unit-steps on the $x$ axis and one unit-step on the $y$ axis, i.e., it moved about 148 pixel diagonally. Then, the maximum tracking velocity was about 9446 or 6298 pixel/s at the maximum DMD modulation frequency of 32,552 Hz for extracting absolute or relative displacement of the target using PWM. If the DMD with commercial maximum resolution of $2560 \times 1600$ was used, the theoretical maximum tracking velocity could reach 163,398 or 108,932 pixel/s for absolute or relative displacement calculation. According to the corrected tracking results, as shown in Figs. 4(b3) and 4(c3), we merged the corrected coordinates on both axes into the actual spatial coordinates and connected them to form the target moving trajectory [see Fig. 4(d)], which was exactly consistent with the original one. Next, we presented the results of follow-up GI with respect to positions 1, 3, 5, 7, and 9 in Figs. 4(e1)–4(e5), respectively. Here, 12 modulated patterns were added in each motion frame for follow-up GI immediately after the tracking patterns were encoded. In the lower left corner of each reconstructed image, we specifically marked the SR used for the current staring gradual imaging. The computational complexities of ghost tracking (only involving two simple functions) and follow-up imaging were of $O (1)$ and $O (N \log N)$ , respectively, where $N = 64 \times 64$ in the experiment. The running time for ghost tracking was much less than 1 ns (negligible), and the processing time for the imaging task was about 0.185 s per motion frame. For tasks with high real-time requirements, the image could be sampled and then reconstructed, and there was no impact on real-time positioning and tracking. Therefore, “real time” in this paper meant more than the tracking could be done in quasi-real time. If we used a high-performance graphics workstation or applied a correlation function (linear sum averaging) instead of an iterative algorithm, the follow-up ghost imaging could also be done in quasi-real time. The above experimental results further verified the feasibility of the proposed method.

Figure 4.Experimental results of background removal, computational ghost tracking, and follow-up ghost imaging. (a) Original motion scene; (b1)–(b3) and (c1)–(c3) processing operations performed on single-pixel values S(x) and S(y), respectively; (d) merged trajectory result of (b3) and (c3); (e1)–(e5) staring gradual imaging results at different positions together with their SRs, PSNRs, and MSSIMs.

Download full size

View all figures

Figure 5.Trajectory recovery and gradual imaging of motion scenes. (a)–(c) Original motion scenes with different moving targets; (d1) and (d2) recovered trajectories using our method and geometric moment-based ghost-tracking method; (e1)–(e5), (f1)–(f5), and (g1)–(g5) restored images of these three motion scenes at different positions using our method; (h1)–(h5) recovered images of the third target using traditional correlation function. The recovered images are marked with their SRs, PSNRs, and MSSIMs.

Download full size

View all figures

Next, we changed another background, chose three different moving targets, and presented their experimental results in Fig. 5. The new background was a maze, and the target was chosen as a cloud, a spacecraft, and a shopping cart, as shown in Figs. 5(a)–5(c). Since the geometric moment-based ghost-tracking method used the same patterns as ours, we compared the performance of the proposed method to this method for the sake of fairness. The recovered trajectories of different moving targets using our method were the same, all completely coincident with the original one, as shown in Fig. 5(d1), while trajectory recovered by the geometric moment-based ghost-tracking method deviated dramatically from the original one [see Fig. 5(d2)], which directly led to the failure of follow-up imaging. The follow-up GI results of three targets at positions 1, 3, 5, 7, and 9 using our method are given in Figs. 5(e1)–5(e5), 5(f1)–5(f5), and 5(g1)–5(g5). They showed that the proposed method was universal for tracking moving targets of different shapes and types. For comparisons, we also provided traditional gradual imaging^[20,29] results using the correlation function of differential GI^[33] in Figs. 5(h1)–5(h5), which were of poor quality. These results highlighted the innovativeness and effectiveness of the proposed method.

5. Discussion and Conclusion

In this paper, a background removal, computational ghost-tracking, and follow-up GI method is proposed. It is assumed that the moving target is a translational rigid body without deformation. In each motion frame, by modulating all-one pattern, horizontal, and vertical asymptotic gradient patterns, and using the background removal center-of-gravity pixel positioning formulas, the spatial pixel coordinates of the moving target can be precisely calculated from single-pixel (bucket) measurements, and then its trajectory can be recovered. Since only two asymptotic gradient patterns are used to acquire the target’s position in each motion frame, it takes less than 5 µs to modulate these two patterns at the highest modulation rate of the commercial DMD; the calculation of the center of gravity involves only a simple mathematical function, and the recovery of the moving trajectory can be done in real time. This method is applicable to targets with nonuniform motion, and there are no limitations in this regard. The topographical features of the target gradually become clearer as the movement progresses. Furthermore, a data correction strategy based on a unit-step is designed to reduce the influence of the measurement noise, which makes our method robust to the measurement noise and reducing the position calculation error to the maximum extent.

Different from the geometric-moment-based target-tracking method, the proposed method fully considers the total light intensity of static background in its formulas of center-of-gravity pixel positioning. Thus, in calculations, it can eliminate the influence of positioning deviation and trajectory scaling caused by the background, and the background is no longer required to be pure black. In addition, the tracking target can be complex with a certain shape, instead of the single homogeneous target (being regarded as a mass point) required in both geometric-moment-based and phase-shifting-based ghost-tracking methods. The potential limitations of this proposed technology are that all results are obtained assuming the background is stationary under constant light illumination, and the target is a translating rigid body without any deformation. However, in practical applications, the background will be affected by wind and light changes, and there may be rotation and scaling of the moving target. In the face of these potential challenges, our future work will focus on the robustness enhancement of the approach.

Both simulation and experimental results have demonstrated the feasibility and performance of our proposed method. We believe that this efficient technology will provide inspiration and new ideas for high-speed computational ghost tracking and imaging. Furthermore, it extends the ghost tracking of a simple centroid target in a pure black background to both fast ghost tracking and follow-up imaging of a rigid body target of complex shape in a complex background, which has more application potential.

Category: Imaging Systems and Image Processing

Received: Sep. 20, 2024

Accepted: Nov. 12, 2024

Published Online: Apr. 30, 2025

The Author Email: Wenkai Yu (yuwenkai@bit.edu.cn)

DOI:10.3788/COL202523.051101

CSTR:32184.14.COL202523.051101