Advanced Photonics Nexus, Volume. 4, Issue 4, 046017(2025)

High-speed video imaging via multiplexed temporal gradient snapshot Editors' Pick

Yifei Zhang1, Xing Liu2, Lishun Wang2, Ping Wang2, Ganzhangqin Yuan1, Mu Ku Chen3, Kui Jiang4, Xin Yuan2、*, and Zihan Geng1,5、*
Author Affiliations
  • 1Tsinghua University, Tsinghua Shenzhen International Graduate School, Shenzhen, China
  • 2Westlake University, Research Center for Industries of the Future and School of Engineering, Hangzhou, China
  • 3City University of Hong Kong, Department of Electrical Engineering, Hong Kong, China
  • 4Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China
  • 5Pengcheng Laboratory, Shenzhen, China
  • show less
    Figures & Tables(18)
    Working principles for SpeedShot. (a) Temporal gradient (TG) images are sparse motion representations. (b) SpeedShot multiplexes TG images for multiframe motion representations, which assists in high-speed video reconstruction. (c) The proposed framework is compatible with low-end commercial cameras with coded exposure photography.
    (a) Mathematical model for hardware encoding. (b) Visualization of SpeedShot’s splitting an uncoded long exposure image into two coded exposures, Yc1 and Yc2, which leads to a multiplexed TG image T.
    Overview of MSRT. A three-level snapshot pyramid is first generated. For each pyramid level k, the dual observation Y(k) passes through the network to obtain a predicted video X^(k) and a refined feature F^(k). Guided by the error E(k) between X^(k) and Y(k), F^(k) is then fused into the next-level reconstruction on a larger scale. The network is recurrently passed through three iterations.
    Details for the ECSF. Feature Fin(k) from the feature extraction block is fused with F(k−1), yielding a fused feature Fout(k).
    Details for the motion-guided hybrid enhancement (MOGHE) block.
    Example of SpeedShot’s reconstruction from Adobe240. With a pair of simultaneously taken observations, SpeedShot records and restores tens of frames of a dynamic scene with nonlinear motions. Interpolation methods often fail in such scenarios due to a lack of pixel correspondence between the first and the last frames. (a) Input. (b) Reconstructions.
    Visual comparison of the GoPro dataset at 8× speed-up.
    Selected reconstructions of Set6. EfficientSCI is the SOTA VSCI method.
    Paired observations from our SpeedShot dual RGB camera prototype, and the corresponding 8× reconstructions. The temporal gradient image highlights the frame-wise object motion and reflects the trajectory of the movement.
    Imaging prototype with an external shutter for optical modulation. It accelerates a 60 Hz camera to 960 Hz.
    Inputs and reconstructions from the SpeedShot prototype with an external mechanical shutter at 960 Hz. One camera captures a temporally coded observation, whereas the other captures a blurry, uncoded image. Subtracting the coded observation from the blurry one yields an additional coded observation, enabling 960 fps video reconstruction.
    Comparison with previous single-camera CEP.
    • Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.

      View table
      View in Article

      Table 1. Results for multiframe restoration on Adobe240 and GoPro. Best results are in bold. Note that known sharp frames are not included in the metrics for VFI methods, whereas all frames are considered for MSRT.

      MethodNetwork inputsAdobe240GoProParameters (M)/Runtime(s)
      16×32×16×32×
      PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
      SuperSloMo39The first and the last clear images28.640.88422.800.72820.860.60228.980.87524.380.74720.450.61839.6/0.38
      IFRNet4130.490.91625.590.80721.430.65329.810.89324.380.74620.690.62019.7/0.42
      GiMM-VFI4233.000.93926.830.83721.930.66330.310.89425.030.75221.560.63830.6/0.93
      UPR-Net4032.340.93426.460.82321.840.66129.690.88524.860.74921.500.6353.7/0.76
      TimeReplayer44The first and the last clear images + event data stream34.140.95034.020.960
      TimeLens*4334.450.95134.810.95933.210.94272.2/—
      A2OF4536.590.96036.610.971
      MSRT (Ours)Two coded blurry images39.080.98335.220.96833.260.95839.420.98436.440.96832.540.93914.9/0.74
    • Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.

      View table
      View in Article

      Table 2. Comparison with image deblurring methods on GoPro. Best result is in bold, second best in italic.

      MethodInputAmplitudeOutputPSNR/SSIM
      NAFNet511 blurry frame7 to 13 frames1 sharp frame33.69/0.967
      Restormer467 to 13 frames32.92/0.961
      Stripformer477 to 13 frames33.08/0.962
      DeepCE211 coded frame32 frames1 sharp frame28.10/0.8627
      EFNet481 blurry frame + event frames7 to 13 frames1 sharp frame35.46/0.972
      REFID497 to 13 frames35.91/0.973
      MSRT (Ours)2 coded frames8 frames8 sharp frames39.42/0.984
      16 frames16 sharp frames36.44/0.968
      32 frames32 sharp frames32.54/0.939
    • Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.

      View table
      View in Article

      Table 3. Comparison for 8× video reconstruction on Set6. Best results are in bold, second best in italic.

      MethodInputPSNRSSIM
      UPR-net40VFI31.080.872
      IFRNet41VFI28.500.813
      EfficientSCI33VSCI35.430.959
      MSRT (Ours)SpeedShot34.720.925
    • Table 4. Analysis of coding pattern selection on Set6.

      View table
      View in Article

      Table 4. Analysis of coding pattern selection on Set6.

      Coding lengthPSNR (dB)Standard deviation of PSNR across patterns
      34.080.45
      8×, symmetric26.13
      16×29.170.31
      32×25.250.27
    • Table 5. Analysis of noise and calibration robustness on 8× Set6.

      View table
      View in Article

      Table 5. Analysis of noise and calibration robustness on 8× Set6.

      Read noiseShot noiseTiming jitterPSNR/SSIM
      34.15/0.904
      34.25/0.912
      33.82/0.896
      0% to 1%34.27/0.921
      0% to 3%33.52/0.916
    • Table 6. Ablation on MSRT network design. Best result is in bold.

      View table
      View in Article

      Table 6. Ablation on MSRT network design. Best result is in bold.

      SRUNError-awareMotion-guidancePSNR/SSIM
      33.44/0.915
      33.76/0.917
      34.05/0.920
      34.40/0.923
      34.27/0.921
      34.72/0.925
    Tools

    Get Citation

    Copy Citation Text

    Yifei Zhang, Xing Liu, Lishun Wang, Ping Wang, Ganzhangqin Yuan, Mu Ku Chen, Kui Jiang, Xin Yuan, Zihan Geng, "High-speed video imaging via multiplexed temporal gradient snapshot," Adv. Photon. Nexus 4, 046017 (2025)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Mar. 26, 2025

    Accepted: Jul. 4, 2025

    Published Online: Aug. 15, 2025

    The Author Email: Xin Yuan (xyuan@westlake.edu.cn), Zihan Geng (geng.zihan@sz.tsinghua.edu.cn)

    DOI:10.1117/1.APN.4.4.046017

    CSTR:32397.14.1.APN.4.4.046017

    Topics