Journal of Infrared and Millimeter Waves, Volume. 41, Issue 5, 914(2022)

An ultra-efficient streaming-based FPGA accelerator for infrared target detection

Shao-Yi CHEN1,2,3,4, Xin-Yi TANG2,3,4, Jian WANG2,3,4, Jing-Si HUANG1,2,3,4, and Zheng LI2,3,4、*
Author Affiliations
  • 1School of Information Science and Technology,Shanghai Tech University,Shanghai 201210,China
  • 2Shanghai Institute of Technical Physics,Chinese Academy of Sciences,Shanghai 20083,China
  • 3University of Chinese Academy of Sciences,Beijing 100049,China
  • 4Key Laboratory of Infrared System Detection and Imaging Technology,Chinese Academy of Sciences,Shanghai 200083,China
  • show less
    Figures & Tables(17)
    The network structure of infrared target detection algorithm based on deep learning
    SkyNet object detection result on FLIR dataset
    Concepts of initial interval and latency
    The accelerator design for balancing all stages of pipeline
    FPGA inference accelerator architecture
    Datapath of pointwise convolution
    Datapath of depthwise convolution
    Using line buffer to optimize datapath
    Datapath of maxpool
    DSP48E2 slice architecture
    Datapath of process element array
    System optimization
    • Table 0. [in Chinese]

      View table
      View in Article

      Table 0. [in Chinese]

      Algorithm 1Pseudocode for Pointwise Convolution Layer

      Input: in× BIT_IN>:feature map input

      weight< N_OUT × N_IN × BIT_WT>[N_OCH / N_OUT][N_ICH / N_IN]:weight of neural network

      N_IN:number of input parallel factor

      N_OUT:number of output parallel factor

      N_ICH:number of input channel

      N_OCH:number of output channel

      BIT_IN:bitwidth of input

      BIT_WT:bitwidth of weight

      BIT_OUT:bitwidth of output

      #pragma HLS DATAFLOW

      forfo = 0;fofo do

        forfi = 0;fifi do

        #pragma HLS PIPELINE II=1

          fori = 0;ii do

          #pragma HLS UNROLL

            foro = 0;oo do

              out += in * weight[fo][fi];

            end for

          end for

        end for

      end for

      Output:out:feature map output

    • Table 0. [in Chinese]

      View table
      View in Article

      Table 0. [in Chinese]

      Algorithm 2Pseudocode for Depthwise Convolution Layer

      Input: in:feature map input

      weight [N_CH / N_IO][9]:weight of neural network

      N_IO:number of input parallel factor

      N_CH:number of input channel

      BIT_IN:bitwidth of input

      BIT_WT:bitwidth of weight

      BIT_OUT:bitwidth of output

      #pragma HLS DATAFLOW

      forf = 0;f < N_CH / N_IO;++f do

        fork = 0;k<9;++k do

          #pragma HLS PIPELINE II=1

          wt_buf = weight[f][k]

          fori = 0;ii do

            #pragma HLS UNROLL

            foro = 0;oo do

              out += in * wt_buf

            end for

          end for

        end for

      end for

      Output:out:feature map output

    • Table 1. SkyNet parameters and performance comparison with the classical network on DAC-SDC dataset

      View table
      View in Article

      Table 1. SkyNet parameters and performance comparison with the classical network on DAC-SDC dataset

      Net nameResNet-18ResNet-34ResNet-50VGG-16SkyNet
      Parameter11.18 M21.28 M23.51 M14.71 M0.44 M
      IoU0.610.260.320.250.73
    • Table 2. Skynet’s parallelism factors of each layer

      View table
      View in Article

      Table 2. Skynet’s parallelism factors of each layer

      LayerTypeKCFM#MACPF
      Total465100800764
      1DW33160×32013824003
      2PW148160×320737280012
      3DW34880×160552960012
      4PW19680×1605898240096
      5DW39640×8027648006
      6PW119240×805898240096
      7DW319220×4027648003
      8PW138420×405898240096
      9DW338420×4027648006
      10PW151220×40157286400256
      11DW3128020×40921600016
      12PW19620×4098304000160
      13PW11020×407680002
    • Table 3. Comparison with DAC-SDC accelerator design

      View table
      View in Article

      Table 3. Comparison with DAC-SDC accelerator design

      iSmartBJUT RunnerSkrSkrOur work
      ModelSkyNetUltraNetSkyNetSkyNet
      # of MACs465M272M465M465M
      # of PFs256448512764
      Frequency(MHz)220166333350
      BRAMs209150.5209206.5
      DSPs329360329360
      LUTs53809446335287550518
      FFs55833588135527840488
      Precision(W,A)11,94,46,85,8
      IoU73.1%65.6%73.1%72.3%
      Throughput(FPS)2521352551
      Power(W)13.56.666.78.4
      Energy(mJ/img)5403112815.2
    Tools

    Get Citation

    Copy Citation Text

    Shao-Yi CHEN, Xin-Yi TANG, Jian WANG, Jing-Si HUANG, Zheng LI. An ultra-efficient streaming-based FPGA accelerator for infrared target detection[J]. Journal of Infrared and Millimeter Waves, 2022, 41(5): 914

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Jan. 13, 2022

    Accepted: --

    Published Online: Feb. 6, 2023

    The Author Email: Zheng LI (lizheng_sitp@163.com)

    DOI:10.11972/j.issn.1001-9014.2022.05.016

    Topics