Photonics Research, Volume. 12, Issue 11, 2524(2024)

Predictive pixel-wise optical encoding: towards single-shot high dynamic range moving object recognition

Yutong He1,2, Yu Liang1,2, Honghao Huang1,2, Chengyang Hu1,2, Sigang Yang1,2, and Hongwei Chen1,2、*
Author Affiliations
  • 1Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • 2Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China
  • show less
    Figures & Tables(14)
    The simplified illustration of the conceptual framework of POE-VP, demonstrating with an HDR motional license plate recognition downstream vision task. POEM (shown in the first row in beige color) and CMPM (shown in the second row in purple color) are two core modules that build up POE-VP.
    (a) The normalized artificial highlight map G(x,y) illustrated in the form of a heat map. The map follows a 2D Gaussian distribution. (b) The frame O(x,y) extracted from an LDR original video in test dataset. The yellow crossline marks the center position of the license plate bounding box, which is coincident with the extreme point of Gaussian distribution in (a). (c) The figurative visualization of HDR information maps H(x,y). The pixel values over 255 in the deep blue region indicate the created artificial HDR information. (d) The simulated LDR frame.
    The illustration of the simulation pipeline. The exemplary frames of “JingA 3MU55” video from the test dataset is used as an example. For this example, preprocessing parameters (E, R) are set to (29, 50). The red arrow indicates the moving direction of license plate. “T” in the top left corner of each frame stands for the frame index of the video. In the simulation experiment, the mask coefficients (a, b) are set to (0.9, 0).
    (a) Simulation results of “JingA 3MU55” license video, which is from the R50E29 class of test datasets. Exemplary frames are shown with the frame index annotations (T). The “Simulated sensor captures” row shows the simulations of captures before “decoding” the inverse process and tone mapping, while the “HDR results” row shows the HDR frames for better human-eye perception by implementing tone mapping operation. The “Simulated sensor captures” in the red font color and “Simulated LDR results” in the blue font color are compared quantitatively and qualitatively in this experiment. The red arrow in the first frame of the original scene indicates the moving direction of the license plate. The first two frames of the video are used for initialization. Frames in the orange box are chosen as exemplary frames to calculate corresponding local information entropy maps. (b) The local information entropy heat maps of the exemplary frames. The license plate regions are zoomed in for better visualization and comparison. See Visualization 1 for video results.
    The exemplary frames of comparison simulations with the other 3 existent methods. The results of our POE-VP are highlighted with the red color. Parts of license plate are zoomed in for detailed comparison on motional artifacts between the multi-shot method and proposed single-shot POE-VP. Note that all the HDR results of the 4 methods are tone mapped under the same approach for displaying.
    (a) The schematic of optical layout inside the DMD prototype. (b) The DMD prototype. (c) The photograph of the light box in the experimental setup. The red arrow shows the moving direction of the license plate. (d) The overall view of the experimental setup.
    (a) LDR capture sequence. The red arrow shows the moving direction of the license plate. (b) The grayscale masks predicted by VP. In the real scene experiment, we set mask coefficients (Vs, a, b) to (200, 0.7, 0.2). (c), (d) Schematic of the PWM and trigger mode of the DMD. The DMD refreshes the binary patterns at each rising edge and generates the grayscale masks within each tD; for the rest time, fully white patterns are loaded (the DMD works as a reflective mirror when all micromirrors are “on”). (e) The exposure periods of the image sensor. tI is the effective exposure time of every single shot. (f) The HDR captures acquired by the prototype with the DMD on. The first two frames are under initialization.
    (a) Hardware experiment results of “JingN F9H28” license plate motional scene. (b) Hardware experiment results of “JingL J7629” license plate motional scene. Exemplary frames of 2 result videos are shown with the frame index annotations (T). The red arrow indicates the moving direction of the license plate. The first two frames of the video are used for initialization. Frames in orange box are chosen as exemplary frames to calculate corresponding local information entropy maps. See Visualization 1 for video results.
    (a) The local information entropy heat maps of the exemplary frames in “JingN F9H28” license videos. (b) The local information entropy heat maps of the exemplary frames in “JingL J7629” license videos. The HDR and LDR license plate results are set by corresponding entropy maps for comparison.
    (a) The high illuminance scene of the additional experiment. (b) The setting of the two compared LDR capture devices, with the POE-VP prototype in the orange dashed box and the LDR camera in the blue dashed box. (c) The LDR images captured by two devices. (d) The local information entropy maps of two LDR captures.
    • Table 1. Quantitative Results of Each Test Videoa

      View table
      View in Article

      Table 1. Quantitative Results of Each Test Videoa

      ClassesTotal Video CountLDR Rec-AccHDR Rec-AccLDR EntropyHDR EntropyTime/ms
      R50E91315.62%93.93%1.39807.244615.566
      R50E29710.11%95.58%2.43065.890416.766
      R50E491220.21%95.46%2.46904.981718.497
      Average3216.14%94.86%2.02556.099816.928
    • Table 2. Quantitative Results of Comparison Simulationa

      View table
      View in Article

      Table 2. Quantitative Results of Comparison Simulationa

      Capture ModeMethodAverage Recognition AccuracyAverage Running Time/ms
      Multi-shots (at least 3 shots)Exposure fusion (classical)65.07%135.261
      Exposure fusion (artifact reduced)88.56%424.831
      Single-shot (with physical setup)Deep optics21.64%1161.458
      POE-VP (ours)94.86%16.928
    • Table 3. Main Parameters of the DMD and Image Sensor

      View table
      View in Article

      Table 3. Main Parameters of the DMD and Image Sensor

      HardwareTypeResolutionPixel Size
      DMDViALUX V-90012560×16007.6 μm
      Image sensorFLIR GS3-U3-123S6M-C2560×16003.45 μm
    • Table 4. Quantitative Results of Two Real HDR License Plate Recognition Scenesa

      View table
      View in Article

      Table 4. Quantitative Results of Two Real HDR License Plate Recognition Scenesa

      License PlateCaptured Frame CountLDR Rec-AccHDR Rec-AccLDR EntropyHDR EntropyTime/ms
      JingN F9H2828092.86%1.20186.932316.807
      JingL J762939097.44%1.20387.277414.317
      Average095.15%1.20287.104915.562
    Tools

    Get Citation

    Copy Citation Text

    Yutong He, Yu Liang, Honghao Huang, Chengyang Hu, Sigang Yang, Hongwei Chen, "Predictive pixel-wise optical encoding: towards single-shot high dynamic range moving object recognition," Photonics Res. 12, 2524 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Imaging Systems, Microscopy, and Displays

    Received: Jun. 24, 2024

    Accepted: Aug. 11, 2024

    Published Online: Oct. 31, 2024

    The Author Email: Hongwei Chen (chenhw@tsinghua.edu.cn)

    DOI:10.1364/PRJ.533288

    CSTR:32188.14.PRJ.533288

    Topics