Photonics Research, Volume. 9, Issue 3, B45(2021)

Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation

Jianhui Ma1, Zun Piao1, Shuang Huang1, Xiaoman Duan1, Genggeng Qin1, Linghong Zhou1,2、*, and Yuan Xu1,3、*
Author Affiliations
  • 1School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
  • 2e-mail: smart@smu.edu.cn
  • 3e-mail: yuanxu@smu.edu.cn
  • show less
    Figures & Tables(16)
    Automatic scatter estimation framework. The MC algorithm generates raw scatter signals in terms of the X-ray source energy spectrum and system geometry configuration. The DRL scheme (denoted by the dashed black arrow) employs a deep Q-network to interact with the statistical distribution model to yield a satisfactory scatter image.
    Network architecture in the DDQN. The network takes a scatter image as input and predicts three possible actions for parameter adjustment. The number at the top denotes the feature map size and channel number, and the operations for each layer are presented at the bottom. For instance, the first hidden layer convolves 16 filters of 3×3 with stride four with the input layer followed by a rectified linear unit (ReLU) activation function, and the output layer is a fully connected linear layer with three outputs.
    (a) is the primary projection of the head and neck (H&N) patient; (b)–(i) represent raw scatter projections that are separately calculated by the MC particle sampling algorithm with source photons of 5×105, 1×106, 5×106, 1×107, 1×108, 1×109, 1×1010, and 1×1012 for the same projection angle.
    (a)–(g) are the scatter images of Figs. 3(b)–3(h) smoothed by the over-relaxation smoothing algorithm; (h) corresponds to Fig. 3(i), which is considered a noise free scatter image and utilized as the ground truth.
    Intensity profiles of Fig. 4 along the (a) horizontal and (b) vertical directions as denoted by the orange lines in Fig. 4(h).
    From top to bottom: six testing results with 5×105, 1×106, 5×106, 1×107, 1×108, and 1×109 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    (a)–(d) Intensity profiles of the first, second, third, and last rows in Fig. 6. The locations of the profiles (a)–(d) are denoted by orange lines at the last column of Fig. 6.
    (a)–(c) indicate boxplots of the metric difference of SSIM, PSNR, and RAE between Empirical and ASEF for all testing cases. metricdiff=metricEmpirical−metricASEF, where metric denotes SSIM, PSNR, and RAE, respectively. (d) is the boxplot of the SSIM comparison of Empirical and ASEF.
    Automatic scatter estimation process for a testing case. (a)–(c) are smoothed scatter images at Steps 1, 7, and 13, respectively. (d) and (e) separately plot the SSIM and RAE over steps.
    Different scatter images. From left to right: scatter projection input, the ground truth of the scatter image at the first column, and Grad-CAM heatmaps of three subnetworks {Wk,Wω,Wβ}.
    From top to bottom: four prostate cases with 5×105, 1×106, 5×106, and 1×107 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    (a)–(d) Intensity profiles of the four prostate cases presented in Fig. 11. Profile locations are outlined by orange lines in the last column of Fig. 11.
    • Table 1. DDQN Training Process

      View table
      View in Article

      Table 1. DDQN Training Process

      1.Initialize main network weights W and target network weights W^
      2.Forepisode=1,2,, Nepisodedo
      3.Forprojection=1,2,,Nprjdo
      4.   Initialize {k0,ω0,β0}
      5.   Generate s1 using Eq. (10) with {k0,ω0,β0}
      6.   Fort=1,2,, Nstepdo
      7.    Randomly select one subnetwork from {Wk,Wω,Wβ}
      8.    With probability ε select action at randomly
      9.    Otherwise choose at=argmaxa[Qπ(st,a;W)]
      10.    Adjust parameters {kt,ωt,βt} according to at
      11.    Generate st+1 using Eq. (10) with {kt,ωt,βt}
      12.    Compute reward rt using Eq. (19)
      13.    Store dataset {st,at,rt,st+1} in experience replay D
      14.    Randomly sample a mini-batch of dataset from D
      15.    Compute the gradient of loss function in Eq. (17)
      16.    Update main network weights W={Wk,Wω,Wβ}
      17.    For every Nupdate steps, let W^=W
      18.   End For
      19.  End For
      20.End For
    • Table 2. Parameters in the DDQN Training Phase

      View table
      View in Article

      Table 2. Parameters in the DDQN Training Phase

      ParametersValuesDescriptions
      Nepisode100Number of training episodes
      Nprj45Number of training projections
      Nstep30Number of steps for each episode
      Nupdate20Number of steps for target network weights update
      D2000Capacity of experience replay memory
      ε[0.01, 1]Probability of random action in ε-greedy algorithm
      γ0.6Discount factor
      lr0.001Learning rate of gradient descent for main network
      Nbatch64Mini-batch samples for network training
    • Table 3. SSIM, PSNR, and RAE Statistics (avg.±std.) among All Testing Casesa

      View table
      View in Article

      Table 3. SSIM, PSNR, and RAE Statistics (avg.±std.) among All Testing Casesa

      Photon NumberSSIM (1=Best)PSNR (dB)RAE (%)
      EmpiricalASEFEmpiricalASEFEmpiricalASEF
      avg.std.avg.std.avg.std.avg.std.avg.std.avg.std.
      5×1050.794.70×1020.942.36×10221.540.8526.551.3412.031.27×1025.621.27×102
      1×1060.883.73×1020.961.67×10223.990.7229.051.228.529.65×1034.226.53×103
      5×1060.978.83×1030.993.85×10330.260.9133.761.033.814.69×1032.423.25×103
      1×1070.984.31×1030.992.02×10333.190.8336.050.892.683.14×1031.872.35×103
      1×1080.994.96×1040.993.97×10443.030.8243.960.730.849.31×1040.747.36×104
      1×1090.994.84×1050.994.64×10552.970.9153.120.890.273.26×1040.263.06×104
    • Table 4. Computation Time for One Scatter Image of a Prostate Patient across Different Photon Numbers

      View table
      View in Article

      Table 4. Computation Time for One Scatter Image of a Prostate Patient across Different Photon Numbers

       Computation Time (s)
      5×1051×1065×1061×1071×1081×1091×10101×1011
      MC0.430.450.570.835.9460.00633.956402.60
      DRL8.984.801.940.980.320.290.290.29
      Total9.415.252.511.816.2660.29634.246402.89
    Tools

    Get Citation

    Copy Citation Text

    Jianhui Ma, Zun Piao, Shuang Huang, Xiaoman Duan, Genggeng Qin, Linghong Zhou, Yuan Xu. Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation[J]. Photonics Research, 2021, 9(3): B45

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Special Issue: DEEP LEARNING IN PHOTONICS

    Received: Oct. 26, 2020

    Accepted: Dec. 21, 2020

    Published Online: Feb. 24, 2021

    The Author Email: Linghong Zhou (smart@smu.edu.cn), Yuan Xu (yuanxu@smu.edu.cn)

    DOI:10.1364/PRJ.413486

    Topics