High-speed video imaging via multiplexed temporal gradient snapshot

Yifei Zhang; Xing Liu; Lishun Wang; Ping Wang; Ganzhangqin Yuan; Mu Ku Chen; Kui Jiang; Xin Yuan; Zihan Geng

doi:10.1117/1.APN.4.4.046017

Advanced Photonics Nexus, Volume. 4, Issue 4, 046017(2025)

High-speed video imaging via multiplexed temporal gradient snapshot Editors' Pick

Yifei Zhang¹, Xing Liu², Lishun Wang², Ping Wang², Ganzhangqin Yuan¹, Mu Ku Chen³, Kui Jiang⁴, Xin Yuan^2、*, and Zihan Geng^1,5、*

Author Affiliations

¹Tsinghua University, Tsinghua Shenzhen International Graduate School, Shenzhen, China

²Westlake University, Research Center for Industries of the Future and School of Engineering, Hangzhou, China

³City University of Hong Kong, Department of Electrical Engineering, Hong Kong, China

⁴Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China

⁵Pengcheng Laboratory, Shenzhen, China

show less

Figures & Tables(18)

Fig. 1. Working principles for SpeedShot. (a) Temporal gradient (TG) images are sparse motion representations. (b) SpeedShot multiplexes TG images for multiframe motion representations, which assists in high-speed video reconstruction. (c) The proposed framework is compatible with low-end commercial cameras with coded exposure photography.

Download full size

View in Article

Fig. 2. (a) Mathematical model for hardware encoding. (b) Visualization of SpeedShot’s splitting an uncoded long exposure image into two coded exposures, $Y_{c 1}$ and $Y_{c 2}$ , which leads to a multiplexed TG image $T$ .

Download full size

View in Article

Fig. 3. Overview of MSRT. A three-level snapshot pyramid is first generated. For each pyramid level $k$ , the dual observation $Y^{(k)}$ passes through the network to obtain a predicted video ${\hat{X}}^{(k)}$ and a refined feature ${\hat{F}}^{(k)}$ . Guided by the error $E^{(k)}$ between ${\hat{X}}^{(k)}$ and $Y^{(k)}$ , ${\hat{F}}^{(k)}$ is then fused into the next-level reconstruction on a larger scale. The network is recurrently passed through three iterations.

Download full size

View in Article

Fig. 4. Details for the ECSF. Feature $F_{in}^{(k)}$ from the feature extraction block is fused with $F^{(k - 1)}$ , yielding a fused feature $F_{out}^{(k)}$ .

Download full size

View in Article

Fig. 5. Details for the motion-guided hybrid enhancement (MOGHE) block.

Download full size

View in Article

Fig. 6. Example of SpeedShot’s reconstruction from Adobe240. With a pair of simultaneously taken observations, SpeedShot records and restores tens of frames of a dynamic scene with nonlinear motions. Interpolation methods often fail in such scenarios due to a lack of pixel correspondence between the first and the last frames. (a) Input. (b) Reconstructions.

Download full size

View in Article

Fig. 7. Visual comparison of the GoPro dataset at 8× speed-up.

Download full size

View in Article

Fig. 8. Selected reconstructions of Set6. EfficientSCI is the SOTA VSCI method.

Download full size

View in Article

Fig. 9. Paired observations from our SpeedShot dual RGB camera prototype, and the corresponding 8× reconstructions. The temporal gradient image highlights the frame-wise object motion and reflects the trajectory of the movement.

Download full size

View in Article

Fig. 10. Imaging prototype with an external shutter for optical modulation. It accelerates a 60 Hz camera to 960 Hz.

Download full size

View in Article

Fig. 11. Inputs and reconstructions from the SpeedShot prototype with an external mechanical shutter at 960 Hz. One camera captures a temporally coded observation, whereas the other captures a blurry, uncoded image. Subtracting the coded observation from the blurry one yields an additional coded observation, enabling 960 fps video reconstruction.

Download full size

View in Article

Fig. 12. Comparison with previous single-camera CEP.

Download full size

View in Article

Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.

View table

View in Article

Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.


Method	Network inputs	Adobe240	GoPro	Parameters (M)/Runtime(s)
8×	16×	32×	8×	16×	32×
PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
SuperSloMo³⁹	The first and the last clear images	28.64	0.884	22.80	0.728	20.86	0.602	28.98	0.875	24.38	0.747	20.45	0.618	39.6/0.38
IFRNet⁴¹	30.49	0.916	25.59	0.807	21.43	0.653	29.81	0.893	24.38	0.746	20.69	0.620	19.7/0.42
GiMM-VFI⁴²	33.00	0.939	26.83	0.837	21.93	0.663	30.31	0.894	25.03	0.752	21.56	0.638	30.6/0.93
UPR-Net⁴⁰	32.34	0.934	26.46	0.823	21.84	0.661	29.69	0.885	24.86	0.749	21.50	0.635	3.7/0.76
TimeReplayer⁴⁴	The first and the last clear images + event data stream	34.14	0.950	—	—	—	—	34.02	0.960	—	—	—	—	—
TimeLens*⁴³	34.45	0.951	—	—	—	—	34.81	0.959	33.21	0.942	—	—	72.2/—
$A^{2} OF$ ⁴⁵	36.59	0.960	—	—	—	—	36.61	0.971	—	—	—	—	—
MSRT (Ours)	Two coded blurry images	39.08	0.983	35.22	0.968	33.26	0.958	39.42	0.984	36.44	0.968	32.54	0.939	14.9/0.74

Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.

View table

View in Article

Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.


Method	Input	Amplitude	Output	PSNR/SSIM
NAFNet⁵¹	1 blurry frame	7 to 13 frames	1 sharp frame	33.69/0.967
Restormer⁴⁶	7 to 13 frames	32.92/0.961
Stripformer⁴⁷	7 to 13 frames	33.08/0.962
DeepCE²¹	1 coded frame	32 frames	1 sharp frame	28.10/0.8627
EFNet⁴⁸	1 blurry frame + event frames	7 to 13 frames	1 sharp frame	35.46/0.972
REFID⁴⁹	7 to 13 frames	35.91/0.973
MSRT (Ours)	2 coded frames	8 frames	8 sharp frames	39.42/0.984
16 frames	16 sharp frames	36.44/0.968
32 frames	32 sharp frames	32.54/0.939

Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.
View table
View in Article
Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.
Method Input PSNR SSIM
UPR-net⁴⁰ VFI 31.08 0.872
IFRNet⁴¹ VFI 28.50 0.813
EfficientSCI³³ VSCI 35.43 0.959
MSRT (Ours) SpeedShot 34.72 0.925

Table 4. Analysis of coding pattern selection on Set6.
View table
View in Article
Table 4. Analysis of coding pattern selection on Set6.
Coding length PSNR (dB) Standard deviation of PSNR across patterns
8× 34.08 0.45
8×, symmetric 26.13 —
16× 29.17 0.31
32× 25.25 0.27

Table 5. Analysis of noise and calibration robustness on 8× Set6.
View table
View in Article
Table 5. Analysis of noise and calibration robustness on 8× Set6.
Read noise Shot noise Timing jitter PSNR/SSIM
✓ — — 34.15/0.904
— ✓ — 34.25/0.912
✓ ✓ — 33.82/0.896
— — 0% to 1% 34.27/0.921
— — 0% to 3% 33.52/0.916

Table 6. Ablation on MSRT network design. Best result is in bold.
View table
View in Article
Table 6. Ablation on MSRT network design. Best result is in bold.
SRUN Error-aware Motion-guidance PSNR/SSIM
— — — 33.44/0.915
— — ✓ 33.76/0.917
✓ — — 34.05/0.920
✓ — ✓ 34.40/0.923
✓ ✓ — 34.27/0.921
✓ ✓ ✓ 34.72/0.925

Tools

Get Citation

Copy Citation Text

Yifei Zhang, Xing Liu, Lishun Wang, Ping Wang, Ganzhangqin Yuan, Mu Ku Chen, Kui Jiang, Xin Yuan, Zihan Geng, "High-speed video imaging via multiplexed temporal gradient snapshot," Adv. Photon. Nexus 4, 046017 (2025)

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Research Articles

Received: Mar. 26, 2025

Accepted: Jul. 4, 2025

Published Online: Aug. 15, 2025

The Author Email: Xin Yuan (xyuan@westlake.edu.cn), Zihan Geng (geng.zihan@sz.tsinghua.edu.cn)

DOI:10.1117/1.APN.4.4.046017

CSTR:32397.14.1.APN.4.4.046017

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.

Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.

Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.

Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.

Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.

Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.

Table 4. Analysis of coding pattern selection on Set6.

Table 4. Analysis of coding pattern selection on Set6.

Table 5. Analysis of noise and calibration robustness on 8× Set6.

Table 5. Analysis of noise and calibration robustness on 8× Set6.

Table 6. Ablation on MSRT network design. Best result is in bold.

Table 6. Ablation on MSRT network design. Best result is in bold.