Unsupervised reconstruction with low-rank tensor embedding based on spatial-intensity-temporal constraints for compressed ultrafast photography

Haoyu Zhou; Zhiming Yao; Wenpei Yang; Dongwei Hei; Yang Li; Baojun Duan; Yinong Liu; Liang Sheng; Yan Song

doi:10.1364/PRJ.555010

1. INTRODUCTION

It is of great significance for analyzing transient physical phenomena to image ultrafast processes continuously on the nanosecond or even picosecond scale in a snapshot. The streak camera [1] is an imaging tool that can detect nanosecond and even picosecond processes. The entrance slit is narrow for streak cameras to avoid spatial-temporal aliasing. The temporal information of the dynamic scene collected by the streak camera is converted into spatial information by the scanning voltage. Therefore, the streak camera can achieve high-temporal-resolution imaging at the expense of a smaller imaging field of view (FoV). Due to the limitation of the narrow entrance slit, continuous imaging of 2D dynamic scenes requires that the event be a repeatable phenomenon and that the FoV must be moved during every imaging step, which makes it inconvenient in practical application. In addition, this imaging method is not suitable for unrepeatable phenomena. Compressed ultrafast photography (CUP) [2 –4] is a new passive ultrafast imaging technology with a frame rate of 10 trillion frames per second and a sequence depth of hundreds of frames. The entrance slit of the streak camera in CUP is fully opened and the pseudo-random binary coding plate is placed in front of the slit to encode the dynamic scenes of different moments. The encoded dynamic scenes are scanned by the scanning voltage, spatially-temporally integrated, and then collected by the internal CCD camera to obtain a 2D compressed image. Based on compressed sensing [5,6] and streak imaging techniques [1], CUP transforms the 3D data cube into a 2D compressed image in a single snapshot and restores it into the dynamic scene through various reconstruction algorithms. This technique has great advantages on recording unrepeatable self-luminous physical phenomena.

However, although CUP has great advantages in terms of frame rate and sequence depth, it is an ill-posed problem to recover the dynamic scene due to the higher compression ratio and lossy coding in the imaging process. The spatial resolution of the reconstructed image obtained by a two-step iterative shrinkage/thresholding (TwIST) algorithm [2,7] is unsatisfactory, which prevents the practical application of CUP. To solve this problem, the reconstruction process is mainly optimized from both hardware and software aspects. In terms of hardware, the dual-channel CUP proposed by Yao et al. [8] and the multichannel CUP proposed by Yao et al. [9] are both based on the additional channels to increase the sampling rate in CUP. Liang et al. [10] proposed a complementary dual-channel lossless CUP and Zhou et al. [11] proposed a time-unsheared image constraint CUP. They add an additional charge-coupled device (CCD) camera to collect the integral image thus providing a spatial-intensity constraint for the reconstruction process and improving the reconstruction quality. He et al. [12] proposed a multimodal fusion-based CUP by adding a transient imaging channel to Refs. [10,11], which acquires several framing images of the dynamic scene, thus providing strong constraints for certain frames of the dynamic scene and acquiring good results. In addition, Yang et al. [13] proposed a genetic algorithm for optimizing the coding masks to increase the sampling rate in CUP, thus regarded as a means of hardware optimization. As for the software optimization, the prior information is crucial in the reconstruction process, including model-based hand-crafted optimization and data-driven deep learning algorithms. Model-based optimization narrows down the range of potential solutions by considering the characteristics of the natural image itself. For example, TV-BM3D [14] combines the sparsity of the gradient domain with the non-local similarity of the natural image to constrain the reconstruction process. Weighted nuclear norm minimization (WNNM) [15] models the image to a low-rank structure in the reconstruction process to promote the low-dimensionality of images. There are many pre-trained denoising networks that can perform denoising operations on images. Plug-and-play (PnP) structure allows these advanced denoising networks such as FFDNet [16], FastDVDnet [17], and DRUnet [18], to be embedded as implicit prior information into the model by combining with ADMM [19] or GAP [20] frameworks. However, these hand-crafted optimizations are inadequate for a wide variety of CUP applications due to its limited discriminative ability for capturing the high-frequency details of images. Data-driven deep learning algorithms [21,22] are trained from a large number of datasets to build the mapping relationship between inputs and outputs and use it as the prior, so they usually have good results when facing data that is similar to the training datasets. However, if the applied data differs greatly from the training data, the trained network may no longer be applicable, which increases the training cost significantly. In addition, it is not easy to acquire high-quality datasets in some situations such as Z-pinch [23], which greatly limits the application scope of data-driven deep learning algorithms.

The untrained neural network (UNN) [24] is the deep learning algorithm where the datasets are not required, mainly including deep image prior (DIP) [25] and deep decoder (DE) [26]. UNN employs neural networks to capture a great deal of low-level image statistics prior and learns appropriate network parameters from degraded images, which has been proven to be a powerful tool for solving the reconstruction problems of computational imaging methods, such as hyperspectral image super-resolution [27], magnetic resonance imaging [28], SIM [29], and coherent phase imaging [30]. However, existing UNN algorithms for CUP [11,31] still find it difficult to restore spatially-temporally superimposed compressed images well when facing dynamic scenes with complex temporal behaviors or involving multiple objects. In this case, constraints of temporal behavior are necessary. The D-HAN [32] is a supervised deep learning network that precisely models shearing operation in a nonlinear polynomial framework, significantly improving the quality of the reconstructed images. In this paper, we combine improved UNN with an innovative CUP system to propose a spatial-intensity-temporal cooperative constrained CUP (SITC-CUP) joint learning model. Specifically, we add an ordinary narrow-slit streak camera to the TUIC-CUP [11] proposed by Zhou et al. The narrow-slit streak camera is used to constrain the 1D spatial information and the temporal behavior of the dynamic scene. The additional external CCD camera takes time-unsheared images to constrain the spatial-intensity of the dynamic scene. In this way, spatial-intensity and temporal behaviors of the system are synergistically constrained to recover the dynamic scene with high fidelity. In addition, we propose a low-rank factorization embedding UNN model for this system. The images are input into the designed unsupervised deep learning framework in the form of low-rank tensor factorization and maintain low-hierarchical-tubal-rank structure [33] during learning. The prior information collected by the proposed hardware system provides guidance for the updating of the neural network parameters. By combining the SITC-CUP system with the low-rank factorization embedding UNN model, the reconstruction quality is substantially improved. Simulation results prove that the performance of the method proposed in this paper achieves the best results compared with other algorithms, both for the simulation of natural image scenes and LED on/off scenes. In the actual LED on/off CUP experiment, it is demonstrated that the proposed method is able to reconstruct dynamic scenes with complex temporal behaviors that roughly match the actual ones.

2. PRINCIPLES

A. System Model

The equipment diagram of the proposed SITC-CUP system is shown in Fig. 1(a). Firstly, the dynamic scene passes through a beam splitter, which divides the light into two beams, one of which is encoded by a coding plate in front of a wide-slit streak camera, sheared by the scanning voltage and integrated spatially-temporally. Then the optical signal is collected by the internal CCD camera to form a single 2D compressed image. The other beam is again split into two beams by another beam splitter, one of which is directly collected by an external CCD camera to obtain a time-integrated image and the other is collected by a streak camera with a narrow entrance slit to realize 1D FoV continuous time imaging of the dynamic scene. Mathematically, the process of the streak camera with the entrance slit fully open can be expressed as $E (x^{'}, y^{'}) = TSCI (x, y, t) + n,$ (1)where $I$ denotes the original dynamic scene, $x$ , $y$ are the space dimensions, $t$ is the time dimension, $E$ denotes the snapshot finally collected by the internal CCD camera of the streak camera after a series of processes, $C$ denotes the spatial encoding process, $S$ denotes the temporal shearing step of streak camera, and $T$ denotes the process of spatial-temporal integration in the internal CCD camera. $n$ denotes the noises in the collection process. Expressing $TSC$ in terms of $O$ , Eq. (1) becomes $E (x^{'}, y^{'}) = OI (x, y, t) + n,$ (2)where $E \in R^{N_{x y}}$ , $O \in R^{N_{x y} \times N_{x y t}}$ , $I \in R^{N_{x y t}}$ , $n \in R^{N_{x y}}$ , $N_{x}$ , $N_{y}$ , and $N_{t}$ denote the numbers of discretized pixels in the $x$ , $y$ , and $t$ coordinates, and $N_{x y}$ and $N_{x y t}$ denote $N_{x} \times N_{y}$ and $N_{x} \times N_{y} \times N_{t}$ , respectively. Correspondingly, the projection angle of the dynamic scene on the external CCD camera is parallel to the $t$ -direction coordinate, which can be expressed as $E_{ccd} (x^{'}, y^{'}) = T I (x, y, t) + n,$ (3)where $E_{ccd}$ denotes the scene collected by the external CCD camera, $I$ denotes the original dynamic scene, and $T$ denotes the process of spatial-temporal integration in the external CCD camera. For the narrow-slit streak camera, the process can be mathematically expressed as $E_{n} (x^{'}, y^{'}) = TSI (y, t) + n,$ (4)where $E_{n}$ denotes the scene collected by the narrow-slit streak camera and $I (y, t)$ denotes the continuous original dynamic scene within 1D FoV. $TS$ denotes the process of converting temporal information into spatial information within the streak camera.

Figure 1.(a) The equipment diagram of the proposed SITC-CUP system. (b) From top to bottom: the imaging models of the wide-slit streak camera, the narrow-slit streak camera, and the CCD camera.

Download full size

View all figures

It can be seen that the number of elements in the 3D dynamic scene is much larger than the number of elements collected by the cameras of the system. Obviously, the inverse problem of the wide-slit streak camera’s signal collection process is an ill-posed problem. According to CS [5] theory, we can solve the inverse problem of Eq. (2) to obtain the original dynamic scene by solving the following least squares optimization problem: $\hat{I} = \underset{I}{\arg \min} {‖ E - OI ‖}_{2}^{2} + λ R (I),$ (5)where the fidelity term ensures the consistency of the dynamic scene with the measurement. $R (I)$ is a regularization term used to limit the signal space to proper ranges.

B. Proposed Method

The deep learning algorithms are playing an increasingly important role in the field of imaging. Supervised deep learning algorithms aim to design different neural networks and update the network parameters from a large number of training datasets, thus establishing a nonlinear mapping between the network inputs and outputs. The optimization function can be written as $\hat{Θ} = \underset{Θ}{\arg \min} Loss (f_{Θ} (y), x),$ (6)where $Θ$ denotes the parameters in the neural network and $f_{Θ} (y)$ denotes the neural network parameterized by $Θ$ . Inspired by supervised deep learning algorithms, Ulyanov et al. found that it is possible to use the convolution neural network (CNN) structure itself as the prior information in image restoration problems. It can optimize the network parameters from degraded images without training datasets, called DIP [25].

Unlike DIP, which uses a CNN structure as the prior information, manifold learning [31,34] is a UNN algorithm that uses the autoencoder composed of fully connected layers as the network structure. By embedding the patched images into the autoencoder, these vectorized patches will be mapped into the low-dimensional space and then restored, thus achieving the purpose of image restoration. The optimization function can be written as $\hat{Z}, \hat{Θ} = \underset{Z, Θ}{\arg \min} E (f_{Θ} (Z), x_{0}), s.t. \hat{x} = f_{\hat{Θ}} (\hat{Z}),$ (7)where $E (f_{Θ} (Z), x_{0})$ is the function corresponding to a specific task, $Z$ denotes the random input noise, and $f_{Θ} (Z)$ is the output of the neural network. Unlike DIP, both the input $Z$ and the network parameters in manifold learning learn and optimize from the degraded images. Different inputs will also have some impact on the final convergence result.

Continuous time images collected in the CUP system are usually autocorrelated and thus have a potential low-rank structure. Considering the complexity of natural images, the relationship between the original tensor and the low-rank representation may be nonlinear. Luo et al. [33] proposed a hierarchical low-rank tensor factorization method, which can be applied to represent the tensor into a low-hierarchical-tubal-rank form: $X = g (\hat{A} Δ \hat{B}),$ (8)where $A \in R^{n_{1} \times r \times n_{3}}$ and $B \in R^{r \times n_{2} \times n_{3}}$ , $r < \min (n_{1}, n_{2})$ , ${rank}_{h} (X) \leq r$ holds, $Δ$ denotes face-wise product between two tensors [35], i.e., $C = A Δ B \Leftrightarrow C^{(i)} = A^{(i)} Δ B^{(i)}$ , and $g$ denotes deep neutral networks (DNNs) consisting of fully connected layers operating on $n_{3}$ dimension. Inspired by low-hierarchical-tubal-rank tensor factorization, we design a low-rank tensor factorization embedding manifold learning unsupervised neural network to capture the underlying low-rank structure of the input images with compact representational capability. We design the input, which is originally random noise in manifold learning, as $\hat{A} Δ \hat{B}$ , thus implicitly preserving the hierarchical tubal rank without SVD calculating, and enhancing the low-rankness of the images. Here DNN is used as a nonlinear transformer from one tensor to another, imposing global low-rank regularization to help obtain a better low-rank representation of the hierarchical tubular rank. The optimization function in the proposed neural network can be written as $\hat{A}, \hat{B}, \hat{Θ} = \underset{A, B, Θ}{\arg \min} E (g_{Θ} (\hat{A} Δ \hat{B}), x_{0}), s.t. \hat{x} = g_{Θ} (\hat{A} Δ \hat{B}),$ (9)where $A \in R^{n_{1} \times r \times n_{3}}$ and $B \in R^{r \times n_{2} \times n_{3}}$ , $r < \min (n_{1}, n_{2})$ , $g_{Θ}$ denotes the manifold learning network, and $Θ$ denotes the parameters in the neural network.

The prior information of the physical model can be used as the guide for updating the parameters of the neural network. By combining the hardware physical model with the proposed deep learning algorithm, the optimization task can be written as $\hat{A}, \hat{B}, \hat{Θ} = \underset{A, B, Θ}{\arg \min} ({‖ E - O (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + ρ {‖ E_{ccd} - T (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + τ {‖ E_{n} - TS (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + λ {TV}_{3 D} (g_{Θ} (\hat{A} Δ \hat{B}))), s.t. \hat{I} = g_{Θ} (\hat{A} Δ \hat{B}),$ (10)where for $g_{Θ} (\hat{A} Δ \hat{B})$ we select only the FoV area of the slit for the narrow-slit term, ${TV}_{3 D} (x)$ is the total variation (TV) [36,37] of the 3D image and represents the 3DTV norm, $ρ$ is generally set to 0.1, and $τ$ is set to 0.5.

As mentioned, the initial input has some impact on the convergence speed and final convergence result of the designed network structure. [34] The $y - t$ scenes collected by the narrow-slit streak camera have similar temporal behavior to the original dynamic scene. Therefore, in order to utilize the prior information of the hardware system more efficiently, the image obtained by the narrow-slit streak camera is used as the initial input. As shown in Fig. 2, Deep Tensor [38,39] is an unsupervised learning neural network that utilizes two generative neural networks to obtain the factorized $A_{0}$ and $B_{0}$ . Taking Eq. (11) as the loss function and employing the implicit prior of DNN in Fig. 2, we can acquire the factorized $A_{0}$ and $B_{0}$ of the narrow-slit streak camera image as the initial input of the algorithm shown in Fig. 3: $\min_{θ_{U}, θ_{V}} {‖ X - f_{U} (z_{U}) f_{V} {(z_{V})}^{T} ‖}^{2},$ (11)where $f_{U} (z_{U})$ , $f_{V} (z_{V})$ are the outputs of the two neural networks, $z_{U}$ , $z_{V}$ are the random input noise of the two networks, and $θ_{U}$ , $θ_{V}$ are the parameters of the two networks, respectively; $X$ is the tensor to be factorized.

Figure 2.Schematic diagram of the Deep Tensor unsupervised low-rank factorization network. $X_{i}$ , $Y_{i}$ : the input elements of $f_{U}$ , $f_{V}$ ; $X_{i}^{'}$ , $Y_{i}^{'}$ : the output elements of $f_{U}$ , $f_{V}$ ; $W_{i}$ , $W_{i}^{'}$ : the parameters of every two layers for $f_{U}$ , $f_{V}$ .

Download full size

View all figures

Figure 3.Flowchart of the proposed algorithm. Loss: loss function; $E$ : image collected by the wide-slit streak camera; $E_{ccd}$ : image collected by the external CCD camera; $E_{n}$ : image collected by the narrow-slit streak camera; $X_{0}$ : result of the first iterative step; SC: streak camera; patch: patch and delay embedding process of 3D image in manifold learning [31]; $W_{1}$ , $W_{2}$ , $W_{3}$ , $W_{4}$ : the parameters of every two fully connected layers.

Download full size

View all figures

In addition, we design an enhancement iteration step. We can get an initial iteration result $X_{0}$ after a certain number of iterations. Then we use Deep Tensor to perform a low-rank factorization on $X_{0}$ to obtain $A_{1}$ and $B_{1}$ as the new input of the network to start the enhancement iteration. In addition, we change the loss function in the new iteration step to Eq. (12): $\hat{A}, \hat{B}, \hat{Θ} = \underset{A, B, Θ}{\arg \min} ({‖ E - O (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + ρ {‖ E_{ccd} - T (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + τ {‖ E_{n} - TS (g_{Θ} (\hat{A} Δ \hat{B})) ‖}_{2}^{2} + κ {‖ (g_{Θ} (\hat{A} Δ \hat{B})) - X_{o} ‖}_{2}^{2} + λ {TV}_{3 D} (g_{Θ} (\hat{A} Δ \hat{B}))), s.t. \hat{I} = g_{Θ} (\hat{A} Δ \hat{B}),$ (12)where $X_{0}$ is the result of the first iteration, and $κ$ is generally set to 0.1. The whole flowchart of the algorithm is shown in Fig. 3. It is worth mentioning that our proposed joint learning model does not require any additional training datasets.

3. RESULTS

A. Simulation Results

Two parts of simulation are tested to demonstrate the superiority of our proposed system and algorithm. The first part covers four motion scenes selected from public datasets, Drop and Aerial, each containing 16 consecutive images with $256 \times 256 pixels$ , and Runner and Crash, each containing 24 consecutive images with $256 \times 256 pixels$ . This part is used to validate the performance of our proposed algorithm in natural images. The second part is the simulation of LED on/off behavior with complex dynamic processes, which is used to verify the robustness of the proposed algorithm in severe spatial-temporal aliasing scenes. We designed three LED on/off scenes with the FoV of $256 \times 256 pixels$ . Each of the three scenes has eight rows of LEDs, which are lit sequentially from bottom to top. The number of LEDs in each row from bottom to top is 1–8. Different rows of LEDs are connected in parallel while the LEDs are connected in series in the same row. Therefore, it can be controlled to realize that LEDs of each row are on/off at the same time, and the voltage of each row is the same. The distance between two adjacent LEDs in the same row is the same, and the center distance is also the same for every two rows. Each LED is designed to be the brightest in the center, and the brightness decreases from inside to outside. The difference between the three scenes lies in the duration and interval of different rows. The three scenes are named LED8-4, LED8-5, and LED10-3, where the first number represents the duration frames, and the second number represents the interval frames. For example, LED10-3 indicates that the eighth row is on at the first frame and off at the 11th frame, while the seventh row is on at the fourth frame and off at the 14th frame. Therefore, we can calculate that the total frames of LED8-4, LED8-5, and LED10-3 are 36, 43, and 31, respectively.

During the simulation process, we set the transmittance of the coding plate as 0.8, and the transmittance of the opaque region as 0.2 in order to be closer to the actual situation. Meanwhile, inspired by Ref. [40], we designed a coding plate with a 30% sampling rate, which means that the region of 0.8 accounts for 30% pixels while the region of 0.2 accounts for 70% pixels. The proposed algorithm is implemented on Python, using the Adam optimizer [41] with a learning rate of 0.01 and the iterative optimization times are set to 1500 on the computer with Intel Core i5-12400K CPU and NVIDIA Geforce GTX 3090 GPU. Running time ranges from 4 to 8 min depending on the number of reconstructed frames.

For the motion scenes of the public datasets, we perform the imaging process as shown in Fig. 1(b) on the dynamic scene to obtain three images collected by different cameras. Specifically, for the wide-slit streak camera, each image is encoded by the designed pseudo-random coding matrix. Subsequently, every encoded image of the dataset is horizontally displaced by one pixel relative to the previous frame, emulating the shearing operation of the streak camera. Finally, the scenes are projected and integrated along the $t$ -axis direction to obtain the 2D compressed measurement. The external CCD measurement can be acquired with only a projection integration step of the dynamic scenes. For the narrow-slit streak camera, we select the centermost 10 column pixels as the FoV and the measurement can be obtained by a shearing and projection integral operation similar to the wide-slit streak camera. We compare the proposed method with GAP-TV [20], FFDNet [16], FastDVDnet [17], DeSCI [15], and TUIC-CUP [11], where the parameters are taken as the default values in the literature. In addition, in order to illustrate the superiority of our designed low-rank tensor embedding algorithm, the narrow-slit streak camera loss function is directly added to TUIC-CUP without changing the network structure for comparison [SITC-CUP (w/o LR) in the table].

Figure 4.(a) The dynamic compressed image of the wide-slit streak camera in LED10-3 simulation process. (b) The integral image of CCD camera in LED10-3 simulation process, where the blue box area is the slit FoV area. (c) The dynamic image of the narrow-slit streak camera in LED10-3 simulation process.

Download full size

View all figures

We selected a single frame from each dataset to evaluate the performance of different algorithms. Figures 5 and 6 display the reconstruction results for the two parts by different algorithms, respectively, together with the ground truths for comparison. From the present results in Fig. 5, it can be seen that our proposed algorithm has the best performance. Compared with SITC-CUP (w/o LR), it can be found that the details are well recovered due to maintaining the low-rank structure in the images. In Fig. 6, it can be seen that GAP-TV and DeSCI have difficulty performing spatial-temporal unmixing operations on the compressed image due to the complex temporal behaviors of the dynamic scene, while FFDNet and FastDVDnet also perform poorly in detail recovery. For TUIC-CUP, the additional CCD camera can provide spatial and intensity constraints in the reconstruction process and improve the overall reconstruction quality. However, there may be some artifacts (for example, LED8-4 #10 upper aera) in aeras where there is nothing due to the CCD camera constraint in such dynamic scenes with complex temporal behavior. The narrow-slit streak camera image can constrain the temporal behavior of the imaging system. Therefore, the proposed model can inhibit this situation to achieve a higher image fidelity.

Figure 5.Reconstruction results of the natural image datasets by different algorithms, together with the ground truths for comparison. The sub-image at the bottom right corner is the enlarged area in the corresponding red box.

Download full size

View all figures

Figure 6.Reconstruction results of the LED simulation datasets by different algorithms, together with the ground truths for comparison. The sub-image at the bottom right corner is the enlarged area in the corresponding yellow box.

Download full size

View all figures

B. Experiment Results

In order to verify the reliability of the proposed CUP system and algorithm in practical application, we design the experiment of the LED on/off behavior according to the imaging system shown in Fig. 1. The light of LEDs is divided into three beams by two beam splitters and collected by the wide-slit streak camera, CCD camera, and narrow-slit streak camera. In order to balance the light intensity of three beams, we use a 7:3 transmission-reflection beam splitter and a half-transmission beam splitter. Therefore, the light intensity ratio of the wide-slit streak camera, CCD camera, and narrow-slit streak camera is 6:7:7. There are five rows of LEDs in the dynamic scene. As shown in Fig. 7(a), each row of LEDs is controlled by the DG645 digital delay-pulse generator and the voltage of each row is 5 V except the bottom row. The voltage of the bottom row is 2 V in order to avoid the bottom row with only one LED being too bright. In addition, Fig. 7(b) shows the photodiode signals for each row of LEDs. In the experiment, we set the duration of each row of LEDs to 20 ns. If the first row of LED light is at 0 ns, the lighting moments of the following four rows of LEDs are 10 ns, 25 ns, 35 ns, and 45 ns, respectively. Figure 8(a) shows the static image of the narrow-slit streak camera, from which we can observe that the width of the narrow-slit is 10 pixels. The images collected by the three cameras are shown in Fig. 8. It is worth noting that the reconstructed image and the 1D constrained image have been calibrated temporally with a step size of 1 ns by collecting images of wide-slit (XIOPM-6200) and narrow-slit (XIOPM-5200) streak cameras using a picosecond laser with a pulse width of 70 ps at different moments. The calibrated images of three specific moments are shown in Fig. 9. Based on the narrow-slit streak image collected in the experiment, 1D constrained images at different moments can be obtained according to the calibration results.

Figure 7.(a) The relationship between the trigger voltage of DG645 digital delay-pulse generator and time for each row of LEDs. (b) The relationship between normalized intensity and time for each row of LEDs measured by the photodiode.

Download full size

View all figures

Figure 8.(a) The static image of the narrow-slit streak camera in the experiment. (b) The dynamic compressed image of the wide-slit streak camera in the experiment. (c) The integral image of CCD camera in the experiment. (d) The dynamic image of the narrow-slit streak camera in the experiment.

Download full size

View all figures

Figure 9.The calibrated images of (a) wide-slit streak camera and (b) narrow-slit streak camera at specific moments (20 ns interval).

Download full size

View all figures

By combining the proposed algorithm and SITC-CUP system, 50 frames are reconstructed with an interval of 1.78 ns for each frame and the video of the reconstruction results is available in Visualization 1 and Visualization 2. We selected a representative frame every five frames from the first frame with a frame interval of 8.9 ns. Figure 10 shows the comparison of the reconstruction results by different methods. It can be seen that our proposed algorithm is able to reconstruct the spatial-temporal evolution of the LED scenes well while the other algorithms are difficult to unmix the dynamic scene spatially-temporally. Even TUIC-CUP cannot reconstruct the temporal behavior of the scenes well; there are many artifacts in the reconstruction results. In order to further evaluate the temporal behavior of our reconstruction results, we separate the five rows of LEDs and calculate their on/off moments separately. It is not easy to determine the on/off moment of the LEDs due to the noise of the imaging process, the diffusion effect of the LEDs, and the point spread function of the imaging system. As shown in Fig. 11(a), we select the white box area with distinct edges in the narrow-slit streak camera image and integrate along $y$ -direction; then we can obtain the relationship between the normalized intensity and scanning time as shown in Fig. 11(b). We find that if we use a 15% peak as the boundary of the light, the calculated duration of LEDs in this row is consistent with the designed duration of DG645. Therefore, we use a 15% peak as the boundary of the LEDs to compute conveniently.

Figure 10.Reconstruction results of the experiment by different methods.

Download full size

View all figures

Figure 11.(a) The selected area (the white box aera) of the narrow-slit streak camera image. (b) The relationship between the normalized intensity and scanning time of the selected aera in the narrow-slit streak camera image.

Download full size

View all figures

Figure 12 displays the relationship between normalized intensity and time of the five rows of LEDs for our proposed algorithm. In addition, Table 3 presents the calculated on/off moments and duration of each row of LEDs by taking the 15% peak as the boundary. If the lighting moment of the first row of LEDs is recorded as 0 ns, the lighting moments of the following rows are 9.72 ns, 25.96 ns, 38.71 ns, and 48.76 ns, respectively, calculated from Table 3. The temporal behaviors of the reconstruction results have good consistency with reality, both in terms of durations and the on/off moments. In addition, Fig. 13 shows the relationship between normalized intensity and time of the first and fourth rows of LEDs for reconstruction results by different methods, which indicates that the reconstructed temporal behaviors of our proposed SITC-CUP are closest to the real situation.Table 3.

On/Off Time and Duration (in ns) for Each Row of LEDs

Row Number	1st	2nd	3rd	4th	5th
On time	4.58	14.30	30.54	43.29	53.34
Off time	24.42	36.21	51.95	65.23	74.17
Duration	19.84	21.91	21.41	21.94	20.83

Figure 12.The relationship between normalized intensity and time of the five rows of LEDs for our proposed algorithm. The dashed line indicates the lighting time calculated by choosing 15% peak as the boundary.

Download full size

View all figures

Figure 13.The relationship between normalized intensity and time of the first row of LEDs (a) and the fourth row of LEDs (b) for reconstruction results by different methods.

Download full size

View all figures

4. CONCLUSION

In summary, we propose a spatial-intensity-temporal cooperative constrained CUP system and an unsupervised deep learning algorithm with low-rank tensor factorization embedding. By introducing the narrow-slit streak camera as a constraint term, the temporal behavior constraint is established in the image reconstruction process. By combining manifold learning with low-rank tensor factorization in the proposed neural network, the reconstructed images have a better global structure even without training datasets. In addition, the factorized low-rank tensor of the narrow-slit streak camera image is used as the input of the network, which not only accelerates the convergence speed but also further improves the quality of reconstruction results. Simulation and experimental results demonstrate that our proposed algorithm and imaging system can obtain high-fidelity reconstruction results in dynamic scenes with complex temporal behaviors, which is expected to be applied practically in biomedical and pulsed radiation source fields in the future, such as blood flow, brain activities, cellular dynamics [43], or Z-pinch [23].

Category: Imaging Systems, Microscopy, and Displays

Received: Jan. 17, 2025

Accepted: Apr. 27, 2025

Published Online: Jul. 1, 2025

The Author Email: Liang Sheng (shengliang@nint.ac.cn), Yan Song (songyan@nint.ac.cn)

DOI:10.1364/PRJ.555010

CSTR:32188.14.PRJ.555010